MITgcm Odyssey Guide
1. General introduction
This wiki introduces the processes to run the MIT General Circulation Model (MITgcm) Hg/POPs simulations on the Harvard FAS Odyssey system. General informations regarding MITgcm can be obtained from the MITgcm users manual. Here I only introduce the content that is immediately related to set up a Hg/POPs simulation on the Harvard machine, but the process should be similar for other machine and computing environments.
To proceed, you need to first obtain a FAS Research Computing account from here.
We have two types of simulations so far:
1) 1 deg x 1 deg offline simulation with ECCO-GODAE assimilated ocean circulation data over a global domain except the Arctic;
2) 36 km x 36 km online simulation with NCEP forcing from the atmosphere over the Arctic.
A third type of simulation with global coverage is also on the way.
These simulations share the same source code, and the differences only exist for the configure file and run directories.
2. Obtain source code
For users from the Harvard BGC group, a copy of the source code can be obtained from my home directory:
cp -r /n/home05/yxzhang/pub/MITgcm/* SOURCE_DIRECTORY
For users outside this group, we are currently working on a github site.
The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure the model is split into execution environment support code and conventional numerical model code. The execution environment support code is held under the eesupp directory. The grid point model code is held under the model directory. Code execution actually starts in the eesupp routines and not in the model routines. For this reason the top-level MAIN.F is in the eesupp/src directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F. Here is a brief description of the directory structure of the model under the root tree:
doc: contains brief documentation notes.
eesupp: contains the execution environment source code. Also subdivided into two subdirectories inc and src.
model: this directory contains the main source code. Also subdivided into two subdirectories inc and src.
pkg: contains the source code for the packages. Each package corresponds to a subdirectory. For example, gmredi contains the code related to the Gent-McWilliams/Redi scheme, aim the code relative to the atmospheric intermediate physics.
tools: this directory contains various useful tools. For example, genmake2 is a script written in csh (C-shell) that should be used to generate your makefile. The directory adjoint contains the makefile specific to the Tangent linear and Adjoint Compiler (TAMC) that generates the adjoint code. This directory also contains the subdirectory build_options, which contains the `optfiles' with the compiler options for the different compilers and machines that can run MITgcm.
utils: this directory contains various utilities. The subdirectory knudsen2 contains code and a makefile that compute coefficients of the polynomial approximation to the knudsen formula for an ocean nonlinear equation of state. The matlab subdirectory contains matlab scripts for reading model output directly into matlab. scripts contains C-shell post-processing scripts for joining processor-based and tiled-based model output. The subdirectory exch2 contains the code needed for the exch2 package to work with different combinations of domain decompositions.
jobs: contains sample job scripts for running MITgcm.
lsopt: Line search code used for optimization.
optim: Interface between MITgcm and line search code.
3. Compiling process
To compile the code, we use the make program. This uses a file (Makefile) that allows us to pre-process source files, specify compiler and optimization options and also figures out any file dependencies. We supply a script (genmake2) that automatically creates the Makefile for you. You then need to build the dependencies and compile the code.
3.1. make a working directory
A working directory should be made along with the code directory. A dedicated working directory should be made for each simulation you plan to run. In the working directory, you should also make three subfolders:
code: header/option or other files that are often modified
build: where the make program puts the intermediate source code
run: the run directory
The run directory will be introduced further. Before we compiling the code, we need to obtain the content of code directory:
For the ECCO offline global simulation except Arctic Ocean:
Hg simulation: cp /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/code/* YOUR_WORKING_DIR/code/
PFC simulation: cp /n/home05/yxzhang/pub/pfcrun/code/* YOUR_WORKING_DIR/code/
For the online Arctic simulation:
cp /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/code/* YOUR_WORKING_DIR/code/
Note, here I use the Hg simulation as an example. For POPs simulations, the content of this directory is also different. I will post the link later.
3.2. customize the code directory
A list of files in the code directory:
For the ECCO offline global simulation except Arctic Ocean:
- CD_CODE_OPTIONS.h
- CPP_OPTIONS.h
- DIAGNOSTICS_SIZE.h
- GAD_OPTIONS.h
- GCHEM_OPTIONS.h
- GMREDI_OPTIONS.h
- HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
- packages.conf
- PTRACERS_SIZE.h
- SIZE.h
For the online Arctic simulation:
- CPP_OPTIONS.h
- DIAGNOSTICS_SIZE.h
- EXF_OPTIONS.h
- GCHEM_OPTIONS.h
- HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
- OBSC_OPTIONS.h
- packages.conf
- PTRACERS_SIZE.h
- SEAICE_OPTIONS.h
- SIZE.h
We don't need to modify most of these files except the ones I marked bold, such as the SIZE.h and HG_OPTIONS.h (or other) files. In the SIZE.h file, we usually modify sNx, sNy, which mean the numbers of points in subgrid in x and y direction, and nPx, nPy, which mean the number of processors to use in each direction. For ECCO offline global simulation, we need to make sNx * nPx = 360 and sNy * nPy = 160. For online Arctic simulation, these two numbers are 210 and 192. The more processors you declared here, the faster your program will be, despite not linearly. On the FAS machine, the processor number between 100-200 is quite appropriate for the scale of our simulations.
3.3. compiling the code
Load the proper compilers for the code (MPI, intel Fortran etc.):
module load hpc/openmpi-intel-latest
Goto the build directory in your working directory:
cd YOUR_WORKING_DIR/build
First, build the Makefile:
SOURCE_DIRECTORY/tools/genmake2 -mods=../code -of SOURCE_DIRECTORY/tools/build_options/linux_ia64_ifort+mpi_harvard2
The command line option tells genmake to override model source code with any files in the directory ../code/.
Once a Makefile has been generated, we create the dependencies with the command:
make depend
This modifies the Makefile by attaching a (usually, long) list of files upon which other files depend. The purpose of this is to reduce re-compilation if and when you start to modify the code. The make depend command also creates links from the model source to this directory. It is important to note that the make depend stage will occasionally produce warnings or errors since the dependency parsing tool is unable to find all of the necessary header files (eg. netcdf.inc). In these circumstances, it is usually OK to ignore the warnings/errors and proceed to the next step.
Next one can compile the code using:
make
The make command creates an executable called mitgcmuv. Additional make ``targets'' are defined within the makefile to aid in the production of adjoint and other versions of MITgcm. On SMP (shared multi-processor) systems, the build process can often be sped up appreciably using the command:
make -j 8
where the ``8'' can be replaced with a number that corresponds to the number of CPUs available.
This marks the end of compiling process, now move the mitgcmuv file to your run directory:
mv mitgcmuv ../run
4. Run the simulation
4.1 obtain the run directory
For the ECCO offline global simulation except Arctic Ocean:
Hg simulation: cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/run/* YOUR_WORKING_DIR/run/
PFC simulation: cp -r /n/home05/yxzhang/pub/pfcrun/run/* YOUR_WORKING_DIR/run/
A list of files and folders inside of this directory:
- data.gchem: options for running the gchem package
- data.off: options for offline simulation
- data.gmredi: options for gmredi package
- data.pkg: options for using different packages
- data: major control file for your simulation
- data.hg or data.pcb or data.pfc: options/paths for Hg related files
- data.ptracers: options and definition of tracers
- data.cal: options for calendar
- data.kpp: options for kpp package
- data.diagnostics: options for the output of diagnostics
- eedata
- POLY3.COEFFS
- run.hg: this is the run script
- input: contains basic input files
- input_gc: input from GEOS-Chem
- input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/run/* YOUR_WORKING_DIR/run/
A list of files and folders inside of this directory:
- data: major control file for your simulation
- data.cal: options for calendar
- data.diagnostics: options for the output of diagnostics
- data.gchem: options for running the gchem package
- data.gmredi: options for gmredi package
- data.hg or data.pcb or data.pfc: options/paths for Hg related files
data.kpp: options for kpp package
data.obcs: options for open boundary conditions- data.pkg: options for using different packages
- data.ptracers: options and definition of tracers
- data.salt_plume
- data.seaice
- DXC.bin
- DXF.bin
- DXG.bin
- DXV.bin
- DYC.bin
- DYF.bin
- DYG.bin
- DYU.bin
- eedata
- LATC.bin
- LATG.bin
- LONC.bin
- LONG.bin
- RA.bin
- RAS.bin
- RAW.bin
- RAZ.bin
- run.arctic.hg: this is the run script
- input_hg: input for Hg related fields
- input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
- obcs: boundary conditions
4.2 Ocean circulation and forcing files
For the ECCO offline simulation, we need to specify the path to the offline ocean circulation data in the data.off file, please specify the path as:
/n/home05/yxzhang/scratch/offline
For the online Arctic simulation, we also need to specify the atmospheric forcing files in data.exf
/n/home05/yxzhang/scratch/input.arctic
These files are pretty large, so we don't need to keep multiple copies of them.
4.3 Boundary conditions
As the online Arctic simulation is a regional model, so it requires boundary conditions over the border of the model domain. The boundary condition is already prepared in the obcs directory. However, you need to prepare your own boundary conditions for your simulation. The code I used for such aim can be downloaded from:
/n/home05/yxzhang/pub/obcs/
4.4 submit job
In your run directory, you can submit your job by using the job script provided:
sbatch run.hg or sbatch run.arctic.hg
We don't need to modify these script files except for the number of processors you declared, which should be equal to the number you specified in the SIZE.h file mentioned in section 3.2.
A quick guide for submitting and managing jobs on Odyssey is available here.
Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use