Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

1. General introduction

This wiki outlines the procedures for running the MIT General Circulation Model (MITgcm) Hg/POPs simulations on Odyssey system. General information about the MITgcm can be found in the MITgcm User's Manual

We have one type of simulation so far:

1) A nominal 1 degree x 1 degree online simulation with ECCOv4 ocean circulation data over a global domain with higher spatial resolution over the Arctic Ocean.

2. Obtain source code

Users from the Harvard BGC group can obtain a copy of the source code from:

/n/sunderland_lab/Lab/MITgcm/

Note: Do NOT copy the verification folder, it takes up huge disk space.

In your ~username home directory make an MITgcm directory and copy all of the folders except verification into your MITgcm directory from the Lab copies. For example:

cd

mkdir MITgcm

cd MITgcm

cp -r /n/sunderland_lab/Lab/MITgcm/bin/ .

cp -r /n/sunderland_lab/Lab/MITgcm/doc/ .

cp -r /n/sunderland_lab/Lab/MITgcm/eesupp/ .

...etc.!

...

Table of Contents

1. General introduction

This wiki outlines the procedures for running the MIT General Circulation Model (MITgcm) Hg/POPs simulations on Odyssey system. General information about the MITgcm can be found in the MITgcm User's Manual

We have one type of simulation so far:

1) A nominal 1 degree x 1 degree online simulation with ECCOv4 ocean circulation data over a global domain with higher spatial resolution over the Arctic Ocean.

2. Obtain source code

Users from the Harvard BGC group can obtain a copy of the source code from from the MITgcm_code repository on Bitbucket. See the initial setup instructions at Running the MITgcm PFOS and Hg Simulations on Odyssey for detailed instructions.

The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure, the model is split into execution environment support code and conventional numerical model code. The execution environment support code is is in the eesupp/ directory. The grid point model code is in the model/ directory. Code execution actually starts in the eesupp/ routines and not in the model routines. For this reason, the top-level MAIN.F is in the eesupp/src/ directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F.

...

  1. Copy your code package as separate folder in MITgcm_code/pkg/ (e.g., MITgcm_code/pkg/pfos/). If you don't know how to develop such a package, a good template to follow is the hg package in /home/geos_harvard/yanxu/MITgcm/pkg/hg/. Generally, you need to write a series of functions in your package that solve different terms of the continuity equation. The physical transport is handled by the ptracer/ package, so you just need to focus on the source-sink terms of your pollutant(s) of interest. You also need a couple of header files to define a series of variables and some files to handle the disk I/O. 
  2. Hook up your code with the main program via the gchem/ package. You should modify several files, including:
    1. gchem_calc_tendency.F: from here you can call the functions that solve different biogeochemical processes, e.g. chemistry, surface forcing, partitioning.
    2. gchem_fields_load.F: from here you can call the function to load input data.
    3. GCHEM.h: add a trigger to enable your package, such as useHG, usePCB.

...

e. gchem_init_fixed.F: from here you can call the function which handles initializing diagnostics.

You can also refer to my this modification to the gchem/ package at  /home/geos_harvard/yanxu/MITgcm/pkg/gchem. Search modifications by grep -i "yxz" *.

...

  1. Need to comment out all calls to ALLOW_CAL in pcb_fields_load.F
  2. In gchem_init_fixed.F, you need to make sure you have the line: CALL PCB_PARM. Yanxu got rid of his hg_parms.F file, so a CALL HG_PARM line is missing from his gchem_initi_fixed.F file. The PCB simulation still has a pcb_parms.F file and if it isn't "turned on" by calling it from gchem_init_fixed.F, then your output will be all NaNs. 
  3. Use online wind, ice, and solar radiation information from ECCOv4. In ECCOv1, we read wind, ice, and radiation from an offline file (e.g., archived from MERRA or GEOS-5). Now those variables are generated online. You need to do two things to activate this capability:
    1. Add "#ifdef USE_EXFIWR" statements to your package. The easiest way to do this is to search "USE_EXFIWR" in the HG code (

      /n/sunderland_lab/MITgcm/pkg/hg/) and copy these to your own code. 

    2. After adding the "#ifdef USE_EXFIWR" statements to your package, you need to update the names of your ice, wind, and radiation variables. You probably need to do this if your code has air-sea exchange, ice interactions, or photochemistry. In pcba_surfforcing.F, which handles air-sea exchange, I had to replace wind(i,j,bi,bj)with windo and fIce(i,j,bi,bj) with ice. If you haven't done this properly, your PTRACER output might have big blocks missing, like this:

...

Here we will set up the following directories within your ~username/MITgcm/verification/global_hg_llc90/ directory:

code: Header/option or other files that are often modified.

...

Before compiling the code, you need to obtain the content of code/ directory. Copy all the files in /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/code/ :

cd ~username/MITgcm/verification/global_hg_llc90/

cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/code/ ./

Lastly, make empty build/ and run/ directories within your ~username/MITgcm/verification/global_hg_llc90/ directory:

cd ~username/MITgcm/verification/global_hg_llc90/ 

mkdir build

mkdir run

If you are running the Hg simulation, you should be all set. If you are running a different simulation (e.g., PCBs or PFOS) and only using Hg as a template, you need to modify:

...

*****Also, a WARNING: the MITgcm Hg simulation has NOT successfully run with this module configuration below on Odyssey. the error is described in section 7, here. This page will be updated as more is understood, but in the meantime please use the modules and optfile listed here, under section (b) ('used by Chris Horvat'), which have been tested and do run on Odyssey. *****

module load hpc/openmpi-intel-latest

module load hpc/netcdf-3.6.3

Then let's go to the build/ directory and build your Makefile:

cd ~username/MITgcm/verification/global_hg_llc90/build

First, build the Makefile. Note: the "-optfile" filename below (and contents) will need to be changed if you have to load different module versions than the specific ones listed above.

make clean        Note: this is needed if you change which modules are loaded and/or the optfile

../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_ia64_ifort+mpi_harvard3      

...

/n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/

6.1 Copy first batch of

...

files 

Copy these folders to ~username/MITgcm/verification/

cd ~username/MITgcm/verification/

cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_oce_cs32/ ./

cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_oce_input_fields/ ./

...

cd ~username/MITgcm/verification/global_hg_llc90/run

ln -s /n/sunderland_lab/Lab/eccov4_input/controls/* .

ln -s /n/sunderland_lab/Lab/eccov4_input/MITprof/* .

ln -s /n/sunderland_lab/Lab/eccov4_input/pickups/* .

ln -s /n/sunderland_lab/Lab/eccov4_input/era-interim/* .

...

mv EIG* forcing/

cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/forcing/runoff-2d-Fekete-1deg-mon-V4-SMOOTH.bin forcing/.

6.4 Initial conditions and other forcing files

Still in your run/ directory, make an initial/ subdirectory: 

mkdir initial/

In this folder, you can put the initial conditions of your tracers. If you have not run the model before, you must link to these files from sunderland_lab as follows:

cd initial/

ln -s /n/sunderland_lab/Lab/.bin forcing/.

6.4 Initial conditions and other forcing files

Still in your run/ directory, make an initial/ subdirectory: 

mkdir initial/

In this folder, you can put the initial conditions of your tracers. If you have not run the model before, you must link to these files from sunderland_lab as follows:

cd initial/

ln -s /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/initial/* .

(Helpful tip: if later you want to change the initial conditions files/filenames, you must edit them in your data.ptracers file within your run directory; data files are copied in step 6.6)

Go back to your run directory and make another directory called input_hg/ for Hg deposition input from the atmosphere:

cd ..       (to get back to your run directory, assuming you were in run/intial/)

mkdir input_hg

Now fill it with your input files. If you do not have any, use the input files from sunderland_lab:

 ln -s /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_hg/* input_hg/.     

If you are running with the food web model (which is the default setting when you copy the code/ directory from lab_sunderland) (see ~username/MITgcm/verification/global_hg_llc90/

...

(Helpful tip: if later you want to change the initial conditions files/filenames, you must edit them in your data.ptracers file within your run directory; data files are copied in step 6.6)

Go back to your run directory and make another directory called input_hg/ for Hg deposition input from the atmosphere:

cd ..       (to get back to your run directory, assuming you were in run/intial/)

mkdir input_hg

Now fill it with your input files. If you do not have any, use the input files from sunderland_lab:

...

code/HG_OPTIONS.h and look if this is set to "define"), you will need to get plankton inputs. 

Still in your run directory:

mkdir input_darwin

ln -s /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_hgdarwin/* input_hg/.     

...

_darwin/.

6.5 Control files

Still in your run/ directory, make a control/ subdirectory:

mkdir control

Move all the control files into this folder

mv xx_* control/

cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/

...

control/

...

Still in your run directory:

mkdir input_darwin

ln -s wt_* control/

6.6 data* files

If you're running an Hg simulation, copy data* files to your run/ directory from here:

mv xx_* control/

cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_darwin/* input_darwin/.

6.5 Control files

Still in your run/ directory, make a control/ subdirectory:

mkdir control

Move all the control files into this folder

run/data* .

Note: within the file called "data" are variables to set how long of a run you want to do (in # of timesteps) and to set the length of a timestep (in seconds). 

If you're running the PCB and DARWIN simulations, copy data* files to your run/ directory from Svante. If you don't know how, Helen Amos might help you this. The bottom line is that you can not reuse the old files from the older ECCO version simulations. 

6.7 Submit job

Copy the submit script into run/, and modify it to any name you like:

cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/control/wt_* control/

6.6 data* files

If you're running an Hg simulation, copy data* files to your run/ directory from here:

cp /n/sunderland_lab/Labrun.mehg .

Then we can submit the job to the queue. To submit:

sbatch YOUR_RUN_SCRIPT

If your run finishes without any problems, the very last line of your STDOUT.0000 file should indicate the model 'ENDED NORMALLY'.

An example run script for running an 8-hour test run with 1-hour timesteps is located here:

/n/home09/hmh/MITgcm/verification/global_hg_llc90/run/data* .

Note: within the file called "data" are variables to set how long of a run you want to do (in # of timesteps) and to set the length of a timestep (in seconds). 

If you're running the PCB and DARWIN simulations, copy data* files to your run/ directory from Svante. If you don't know how, Helen Amos might help you this. The bottom line is that you can not reuse the old files from the older ECCO version simulations. 

6.7 Submit job

Copy the submit script into run/, and modify it to any name you like:

cp /n/sunderland_lab/Lab/run_original_fixed/run.8hr.testrun.chrismodules.96core

You may need more than 1 hour for the run to complete, 120 minutes is conservative. This is for the data file configuration located here that sets the timesteps and length to 8 hours:

/n/home09/hmh/MITgcm/verification/global_hg_llc90/run/run.mehg .

Then we can submit the job to the queue. To submit:

sbatch YOUR_RUN_SCRIPT

...

_original_fixed/data

6.8 How to check on your run during and after completion

...

sacct -j JOBID --format=JobID,JobName,ReqMem,MaxRSS,Elapsed

for on To learn how how to use the output from the above see https://rc.fas.harvard.edu/resources/odyssey-quickstart-guide/ section "A note on requesting memory".

6.9 Debugging tips 

...

1. If your run crashes, check the following files for error messages:

...

  1. Go to your code/ directory. 
  2. Make a copy of your SIZE.h file and rename it SIZE.h.96np. Now when you need to go back to 96 cores, your SIZE.h file information is saved.
  3. Open SIZE.h
  4. In SIZE.h, change sNx and sNy = 90
  5. In SIZE.h., change nPx = 13
  6. Save and close SIZE.h
  7. Edit your optfile of choice within ~username/MITgcm/tools/build_options/ to have "-mcmodel=medium -shared-intel" within $FFLAGS.
  8. Go to your build/ directory. 
  9. Recompile your code. 
  10. Move the mitgcmuv* executable to your run/ directory.
  11. Go to your run directory.
  12. Make a copy of data.exch2 and rename it data.exch2.96np. Now when you need to go back to 96 cores, it's saved.
  13. Open data.exch2.
  14. Comment out the line blanklist = XXXX, where XXX will be a list of numbers.
  15. Save and close data.exch2
  16. Make a copy of your run script and rename it <your_run_script.csh.96np>. Now when you need to go back to 96 cores, your run script is saved.
  17. Open your run script
  18. Change the number of cores you're requesting from 96 to 13 in your run script. 

  19. Submit your job.

6.9 Final remarks

Documentation for ECCOv4, the physical model configuration of our simulations [pdf], and the associated publication [pdf].

Processing model output and regridding input files involves the gcmfaces/ package. Documentation for gcmfaces/ is available here [pdf].

Special thanks to Gael Forget, Stephanie Dutkiewicz and Jeffery Scott at MIT. 

7. Issues to watch out for

  1. saved.
  2. Open data.exch2.
  3. Comment out the line blanklist = XXXX, where XXX will be a list of numbers.
  4. Save and close data.exch2
  5. Make a copy of your run script and rename it <your_run_script.csh.96np>. Now when you need to go back to 96 cores, your run script is saved.
  6. Open your run script
  7. Change the number of cores you're requesting from 96 to 13 in your run script. 

  8. Submit your job.

6.9 Final remarks

Documentation for ECCOv4, the physical model configuration of our simulations [pdf], and the associated publication [pdf].

Processing model output and regridding input files involves the gcmfaces/ package. Documentation for gcmfaces/ is available here [pdf].

Special thanks to Gael Forget, Stephanie Dutkiewicz and Jeffery Scott at MIT. 

7. Issues to watch out for - check here if you have problems


Is your run timing out, yet not getting past the first timestep / not making any output? Do you get a message in your .out file that looks like this: 

Rosenbrock: Unsucessful step at T= 0.000000000000000E+000 (IERR= -7 
)
just here: 0.900000000000000 NaN
1.00000000000000 4.00000000000000

This is an issue with the KPP chemical solver not converging and requires a change in how the code is compiled compilation. More will be updated later, but for now follow instructions for loading modules here (section (b)-Chris Horvat) using the optfile available for download here

 

Is your run crashing because of diagnostics issues?

...

If the link is not to a specific file within the /n/lab_sunderland directory, you may need to re-link! e.g., if it looks like this: "../../global_oce_input_fields/ecmwf//EIG_dsw_1992" instead of "/n/sunderland_lab/Lab/eccov4_input/era-interim/EIG_dsw_1992". Note: this only seems to be a problem sometimes.. so don't be concerned if your links look weird, only if your run crashes looking for a linked file.

Do you have no idea what is going on? check here for more general info including how to get in touch with MITgcm support

8. Appendix 1: Odyssey modules and optfile working combinations for compiling MITgcm

...

You should keep all your optfiles within your directory: ~username/MITgcm/tools/build_options/ .

1. Old module system:

a. Standard, as written in instructions above. optfile is already in your build_options directory if you follow the instructions to copy folders from sunderland_lab

module load hpc/openmpi-intel-latest

module load hpc/netcdf-3.6.3

../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_ia64_ifort+mpi_harvard3

Note: this will load the following versions: intel compiler 13.0.079; openmpi 1.6.2. 

          b. used by Chris Horvat. Download optfile by clicking here. use scp to copy this file to Odyssey, then mv into build_options directory.   OR - you can copy from /n/home09/hmh/MITgcm/tools/build_options/linux_amd64_ifort_mpi_odyssey2 

module load centos6/openmpi-1.7.2_intel-13.0.079

module load centos6/netcdf-4.3.0_intel-13.0.079

../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_amd64_ifort_mpi_odyssey2 -mpi -enable=mnc

2. New module system:

Download optfile by clicking here.

...

1.Load Lmod, Odyssey's new module system. at the command line, in any directory, enter:

source new-modules.sh

2. Load intel compiler:

module load intel/13.0.079-fasrc01

3. Find out which modules are compatible with this intel version:

module avail

Right now, the list looks something like this:

openmpi/1.6.5-fasrc01 
openmpi/1.8.1-fasrc01
openmpi/1.8.3-fasrc01
netcdf/3.6.3-fasrc01

This means you can choose any of the 3 openmpi versions, but there is only one compatible netCDF version.

 4. Load your openmpi module of choice and netCDF module. As an example, here we'll choose openmpi 1.6.5.

module load openmpi/1.6.5-fasrc01

module load netcdf/3.6.3-fasrc01

 5. Find out what the filepaths are for these modules:

printenv

Now look for  "LD_LIBRARY_PATH" and "CPATH" (search within the terminal window). For the modules above, it should look something like this:

 

LD_LIBRARY_PATH=/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/netcdf/3.6.3-fasrc01/lib64:/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/lib:/n/sw/intel_cluster_studio-2013/lib/intel64:/lsf/7.0/linux2.6-glibc2.3-x86_64/lib
CPATH=/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/netcdf/3.6.3-fasrc01/include:/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/include:/n/sw/intel_cluster_studio-2013/composerxe/include/intel64:/n/sw/intel_cluster_studio-2013/composerxe/include

6. Create a new optfile, by making a copy of a previous one, within your ~username/MITgcm/tools/build_options/ directory.

cd ~username/MITgcm/tools/build_options/

cp linux_ia64_ifort+mpi_harvard3 linux_ia64_ifort+mpi_harvard_test             (just an example, can change filename to whatever you want)

7. Open the file you've just copied (e.g., with emacs, nano, vi, or whatever text editor), and look for the following lines, which you will want to edit (Note, they may be slightly different, this is an example): 

INCLUDES='-I/n/sw/openmpi-1.6.2_intel-13.0.079/include -I/n/sw/intel_cluster_studio-2013/mkl/include'

...

cd /n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/include

5. More information on Odyssey modules & useful commands:

https://rc.fas.harvard.edu/resources/documentation/software-on-odyssey/modules/

module purge - clears all loaded modules

module list - shows currently loaded modules