1. General introduction

This wiki outlines the procedures for running the MIT General Circulation Model (MITgcm) Hg/POPs simulations on MIT's server system, Svante. General information about the MITgcm can be found in the MITgcm User's Manual.

We have one type of simulation so far:

1) A nominal 1 degree x 1 degree online simulation with ECCOv4 ocean circulation data over a global domain with higher spatial resolution over the Arctic Ocean.

2. Obtain source code

Users from the Harvard BGC group can obtain a copy of the source code from:

/home/geos_harvard/yanxu/MITgcm (Note: Do NOT copy the verification folder, it takes up huge disk space.)

For users outside this group, we are currently working on a Github site.

The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure, the model is split into execution environment support code and conventional numerical model code. The execution environment support code is is in the eesupp/ directory. The grid point model code is in the model/ directory. Code execution actually starts in the eesupp/ routines and not in the model routines. For this reason, the top-level MAIN.F is in the eesupp/src/ directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F.

Here is a brief description of the directory structure of the model under the root tree:

doc: Contains brief documentation notes.

eesupp: Contains the execution environment source code. Also subdivided into two subdirectories inc/ and src/.

model: This directory contains the main source code. Also subdivided into two subdirectories inc/ and src/.

pkg: Contains the source code for the packages. Each package corresponds to a subdirectory. For example, gmredi/ contains the code related to the Gent-McWilliams/Redi scheme.

tools: This directory contains various useful tools. For example, genmake2 is a script written in csh (C-shell) used to generate your Makefile. The directory adjoint/ contains the Makefile specific to the Tangent linear and Adjoint Compiler (TAMC) that generates the adjoint code. The tools/ directory also contains the subdirectory build_options/, which contains the 'optfiles' with the compiler options for the different compilers and machines that can run MITgcm.

utils: This directory contains various utilities. The subdirectory knudsen2/ contains code and a Makefile that compute coefficients of the polynomial approximation to the knudsen formula for an ocean nonlinear equation of state. The matlab/ subdirectory contains MATLAB scripts for reading model output directly into MATLAB. The scripts/ directory contains C-shell post-processing scripts for joining processor-based and tiled-based model output. The subdirectory exch2/ contains the code needed for the exch2 package to work with different combinations of domain decompositions.

jobs: Contains sample job scripts for running MITgcm.

lsopt: Line search code used for optimization.

optim: Interface between MITgcm and line search code.

3. Insert your own package or migrate a package from ECCOv1 to ECCOv4

If you want to add a chemical tracer simulation (e.g., Hg, PCBs, PFOS), please follow the instructions below:

Copy your code package as separate folder in MITgcm/pkg/ (e.g., MITgcm/pkg/pfos/). If you don't know how to develop such a package, a good template to follow is the hg package in /home/geos_harvard/yanxu/MITgcm/pkg/hg/. Generally, you need to write a series of functions in your package that solve different terms of the continuity equation. The physical transport is handled by the ptracer/ package, so you just need to focus on the source-sink terms of your pollutant(s) of interest. You also need a couple of header files to define a series of variables and some files to handle the disk I/O.
Hook up your code with the main program via the gchem/ package. You modify several files, including:
1. gchem_calc_tendency.F: from here you can call the functions that solve different biogeochemical processes, e.g. chemistry, surface forcing, partitioning.
2. gchem_fields_load.F: from here you can call the function to load input data.
3. GCHEM.h: add a trigger to enable your package, such as useHG, usePCB.

d. gchem_readparms.F: from here you can call the function which handles initializing parameters.

e. gchem_init_fixed.F: from here you can call the function which handles initializing diagnostics.

You can also refer to my modification to the gchem/ package at /home/geos_harvard/yanxu/MITgcm/pkg/gchem. Search modifications by grep -i "yxz" *.

Tips & tricks if you're migrating your own package to ECCOv4

Here are issues that came up when migrating the PCB simulation from ECCOv1 on Harvard's Odyssey server to ECCOv4 on MIT's Svante server:

Need to comment out all calls to ALLOW_CAL in pcb_fields_load.F
In gchem_init_fixed.F, you need to make sure you have the line: CALL PCB_PARM. Yanxu got rid of his hg_parms.F file, so a CALL HG_PARM line is missing from his gchem_initi_fixed.F file. The PCB simulation still has a pcb_parms.F file and if it isn't "turned on" by calling it from gchem_init_fixed.F, then your output will be all NaNs.
Use online wind, ice, and solar radiation information from ECCOv4. In ECCOv1, we read wind, ice, and radiation from an offline file (e.g., archived from MERRA or GEOS-5). Now those variables are generated online. You need to do two things to activate this capability:
1. Add "#ifdef USE_EXFIWR" statements to your package. The easiest way to do this is to search "USE_EXFIWR" in the PCB code (
  /home/geos_harvard/helen/MITgcm_ECCOv4/pkg/pcb/) and copy these to your own code.
2. After adding the "#ifdef USE_EXFIWR" statements to your package, you need to update the names of your ice, wind, and radiation variables. You probably need to do this if your code has air-sea exchange, ice interactions, or photochemistry. In pcba_surfforcing.F, which handles air-sea exchange, I had to replace wind(i,j,bi,bj) with windo and fIce(i,j,bi,bj) with ice. If you haven't done this properly, your PTRACER output might have big blocks missing, like this:

4. Make a working directory

A working directory should be made along with the code directory. A dedicated working directory should be made for each simulation you plan to run. In the working directory, you should also make three subfolders:

code: Header/option or other files that are often modified.

build: Where the make program puts the intermediate source code.

run: The run directory.

Before compiling the code, you need to obtain the content of code/ directory. Copy all the files in /home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/code/ to your code/ directory.

cd YOUR_WORKING_DIR/code/
cp -r /home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/code ./

If you are running the Hg simulation, you should be all set. If you are running a different simulation (e.g., PCBs or PFOS) and only using Hg as a template, you need to modify:

X_OPTIONS.h : Where 'X' needs to be renamed to match your chemical package (e.g. PCB_OPTIONS.h) and the contents should match the options in your package code (e.g., in pkg/pcb/).

packages.config : Search and replace 'hg' with your package name (e.g., 'pcb').

X_SIZE.h : *Special instructions!* If you have copied Yanxu's Hg source code, you do not need to add an HG_SIZE.h file to our code/ directory. Your HG_SIZE.h file is already in pkg/hg/. If you are running PCBs, you need a PCB_SIZE.h file to your code/ directory. You can copy one from /home/geos_harvard/helen/MITgcm_ECCOv4/verification/global_pcb_llc90/code/. If you a running another package (e.g., PFOS), check your pkg/pfos/ directory. If you don't see a PFOS_SIZE_.h file, then you need to add one to your code/ directory.

5. Compiling the code

Before compiling, you also need to load the proper compilers for the code (MPI, intel Fortran etc.):

module load gcc
module load intel
module load netcdf/20130909
module load openmpi/1.7.2
export MPI_INC_DIR="/home/software/intel/intel-2013_sp1.0.080/pkg/openmpi/openmpi-1.7.2/include/"
export NETCDF_ROOT="/home/software/intel/intel-2013_sp1.0.080/pkg/netcdf/netcdf-20130909/"

You can also copy a text file which contains the above content at

/home/geos_harvard/yanxu/yanxu/MITgcm/preload

and run the following command before you compile your code.

source /your/path/to/preload

Then let's go to the build/ directory:

cd YOUR_WORKING_DIR/build

First, build the Makefile:

SOURCE_DIRECTORY/tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_amd64_ifort11 -mpi

The command line option tells genmake to override model source code with any files in the directory ../code/. I have written an alias called 'premake' in my .bashrc file to replace this long genmake command. If you copy my 'premake' alias into your own .bashrc file, then you would type:

premake

Once a Makefile has been generated, we create the dependencies with the command:

make depend

This modifies the Makefile by attaching a (usually, long) list of files upon which other files depend. The purpose of this is to reduce re-compilation if and when you start to modify the code. The make depend command also creates links from the model source to this directory. It is important to note that the make depend stage will occasionally produce warnings or errors since the dependency parsing tool is unable to find all of the necessary header files (eg. netcdf.inc). In these circumstances, it is usually OK to ignore the warnings/errors and proceed to the next step.

Next, you compile the code using:

make

The make command creates an executable called mitgcmuv. Additional make "targets'' are defined within the makefile to aid in the production of adjoint and other versions of MITgcm. On SMP (shared multi-processor) systems, the build process can often be sped up appreciably using the command:

make -j 4

where the ``4'' can be replaced with a number that corresponds to the number of CPUs available.

Debugging Tip: If you are only making small changes to the code, you don't need to go through the whole recompilation process again. Just type "make" to recompile.

If the compiling goes well (i.e. no error message), we can move the generated mitgcmuv file to your run directory:

mv mitgcmuv ../run

6. Running the simulation

The first time you run, you need to follow steps 6.0-6.6 to set up your run directory properly.

MITgcm output is big and will quickly fill up your space on the head node (300 Gb). If you are running out of space on the head node, contact Jeff Scott (jscott@mit.edu) to ask for disk space on a file server. Once you have space on a file server, at a minimum, you should store your model output on the file server. You can even set up your run/ directory on the file server and run jobs directly from there, so you're not bogging down the head node by moving large files after each run. If you want to see an example of a run/ directory set up on a file server, check these out:

/net/fs02/d2/geos_harvard/helen/MITgcm_ECCOv4/verification/global_darwin_llc90/run/
/net/fs02/d2/geos_harvard/helen/MITgcm_ECCOv4/verification/global_pcb_llc90/run/

6.0 Set up empty directories (you'll fill them later)

From your MITgcm/ directory, follow these commands to make the directories you'll need :

mkdir verification/
cd verification/

mkdir global_darwin_llc90/ (if you are running DARWIN plankton only)
cd global_darwin_llc90/
mkdir run/

mkdir global_hg_llc90/ (if you are running the Hg package)
cd global_hg_llc90/
mkdir run/

mkdir global_pcb_llc90/ (if you are running the PCB package)
cd global_pcb_llc90/
mkdir run/

6.1 Copy first batch of files

Copy these folders to verification/:

/home/geos_harvard/yanxu/MITgcm/verification/global_oce_cs32
/home/geos_harvard/yanxu/MITgcm/verification/global_oce_input_fields

Copy these folders to global_X_llc90/, where 'X' corresponds to the simulation you're trying to run (e.g., 'darwin', 'hg', 'pcb'):

/home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/input/
/home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/input.core2
/home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/input.ecco_v4
/home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/input.ecmwf
/home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/input_itXX

Go into the input_itXX/ directory and update the file paths:

set dirInputFields = YOUR_WORKING_DIRECTORY/MITgcm/verification/global_oce_input_fields
setdirLlc90 = /home/geos_harvard/yanxu/MITgcm/verification/global_ocea_llc90

Execute this command in your run/ directory.

csh (note: this command opens a c-shell)
../input_itXX/prepare_run
exit (note: this closes the c-shell)

6.2 Link forcing files to your run folder

Go to your run/ folder:

cd YOUR_WORKING_DIR/run
ln -s ~gforget/ecco_v4_input/controls/* .
ln -s ~gforget/ecco_v4_input/MITprof/* .
ln -s ~gforget/ecco_v4_input/pickups/* .
ln -s ~gforget/ecco_v4_input/era-interim/* .

6.3 Forcing folder

Still in your run/ directory, make a forcing/ subdirectory:

mkdir forcing

Move all the forcing files insde:

mv EIG* forcing/
mv runoff-2d-Fekete-1deg-mon-V4-SMOOTH.bin forcing/

6.4 Initial conditions and other forcing files

Still in your run/ directory, make an initial/ subdirectory:

mkdir initial/

In this folder, you can put the initial conditions of your tracers. If you need a copy of the initial conditions for the ECCOv4 DARWIN simulation, they're available here:

/home/yanxu/MITgcm/verification/global_darwin_llc90/run1/initial/

6.5 Control files

Still in your run/ directory, make a control/ subdirectory:

mkdir control

Move all the control files into this folder

mv wt_* xx_* control/

6.6 data* files

If you're running an Hg simulation, copy data* files to your run/ directory from here:

cp /home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/run/data*

If you're running the PCB simulation, copy data* files to your run/ directory from here:

cp /net/fs02/d2/geos_harvard/helen/MITgcm_ECCOv4/verification/global_pcb_llc90/run/data*

If you're running the DARWIN ecology simulation, copy data* files to your run/ directory from here:

cp /home/geos_harvard/yanxu/MITgcm/verification/global_darwin_llc90/run1/data*

6.7 Submit job

Copy the submit script into run/:

cp /home/geos_harvard/yanxu/MITgcm/verification/global_hg_llc90/qsub_itXX.csh

Modify the path to your run/ directory and the simulation name in qsub_itXX.csh. If you don't do this, you could overwrite Yanxu or Helen's output and we will be very unhappy. In qsub_itXX.csh, you can choose different queues, depending on the length of your task: short, medium, long, xlong, xxlong (sounds like sizes for a T-shirt, doesn't it?).

Then we can submit the job to the queue. Jobs can be submitted from any run directory, but you must be logged into the head node (i.e., your login should looked like ssh -Y <username>@svante.mit.edu). To submit:

qsub qsub_itXX.csh

If your run finishes without any problems, the very last line of your STDOUT.0000 file should indicate the model 'ENDED NORMALLY'

6.8 Debugging tips

1. If your run crashes, check the following files for error messages:

STDERR.0000

STDOUT.0000

<job_name>.o<job_id> (where <job_name> is whatever you named the run in your qsub_itXX.csh file and <job_id> is the ID number assigned by the queue system)

2. You may also find it is helpful to check the *.f files in your build/ directory. This is what the code looks like at 'run time', so if pieces of code are being chopped off or #include statements are missing, this kind of thing will turn up in the *.f compiled files.

3. If you want to isolate if your problem is coming from partitioning, chemistry, deposition, etc, you can comment out individual processes in gchem_calc_tendency.F. Recompile with 'make' and run with a limited number of processes turned on.

Debugging with fewer processors (faster!)

ECCOv4 is configured to run on 96 cores. While you are encouraged to run with 96 cores for simulations that you'll do science with, 96 cores is a huge requirement and you can end up waiting a LONG in the queue for your job to begin. If you are debugging and only need to do short tests, use 13 cores. Here's how to change the number of cores you need for a job:

Go to your code/ directory.
Make a copy of your SIZE.h file and rename it SIZE.h.96np. Now when you need to go back to 96 cores, your SIZE.h file information is saved.
Open SIZE.h
In SIZE.h, change sNx and sNy = 90
In SIZE.h., change nPx = 13
Save and close SIZE.h
Go to your build/ directory.
Recompile your code.
Move the mitgcmuv* executable to your run/ directory.
Go to your run directory.
Make a copy of data.exch2 and rename it data.exch2.96np. Now when you need to go back to 96 cores, it's saved.
Open data.exch2.
Comment out the line blanklist = XXXX, where XXX will be a list of numbers.
Save and close data.exch2
Make a copy of your run script and rename it <your_run_script.csh.96np>. Now when you need to go back to 96 cores, your run script is saved.
Open your run script (e.g., qsub_itXX.csh)

Change the number of cores you're requesting from 96 to 13. Do NOT request sandy cores for debugging. This slows everyone down. You can request nehalems, or check which nodes are free and specifically request those. More detail in the Svante User's Manual, but here are the basics.

Type this in the terminal to check which nodes are free:

> module load nodeload

> nodeload

Modify these two lines in your qsub_itXX.csh script to request 13 cores:

#PBS -l nodes=13                                  (to ask for any 13 nodes)

or

#PBS -l nodes=agnetha:ppn=7+bjorn:ppn=6           (to ask for specific nodes, "agnetha" and "bjorn")

mpirun -np 13 ./mitgcmuv                          (to submit your run to 13 nodes)

Save and close your run script.
Submit your job.

6.9 Final remarks

A Best Practice Guide for Svante is here [pdf]. This has useful information about the queue system and storage.

Documentation for ECCOv4, the physical model configuration of our simulations [pdf], and the associated publication [pdf].

Processing model output and regridding input files involves the gcmfaces/ package. Documentation for gcmfaces/ is available here [pdf].

Special thanks to Gael Forget, Stephanie Dutkiewicz and Jeffery Scott at MIT.

7. Issues to watch out for

Is your run crashing because of diagnostics issues?

The value assigned to PTRACERS_num in code/PTRACERS_SIZE.h needs to match the value of PTRACERS_numInUse in run/data.ptracers.
If PTRACERS_useKPP = .FALSE. in run/data.ptracers, then you have to remove all KPP diagnostics from your run/data.diagnostics file.
Make sure your package is turned on in run/data.gchem (e.g., usePCB = .TRUE.)
Make sure the gchem package is turned on in run/data.pkg (useGChem = .TRUE.)
Make sure you package is listed in code/packages.config under 'gchem' (e.g., 'hg' or 'pcb')

SEAS BGC Research Group

MITgcm ECCOv4 Svante Guide

Analytics