...
1. General introduction
This wiki introduces the processes to run the MIT General Circulation Global Terrestrial Mercury Model (MITgcm) Hg/POPs simulations on the Harvard FAS Odyssey system. General informations regarding MITgcm can be obtained from the MITgcm users manual. GTMM) coupled to the GEOS-Chem Hg simulation. Here I only introduce the content that is immediately related to set up a Hg/POPs simulation on the Harvard machine, but the process should be similar for other machine and computing environmentsas this is a work in progress.
To proceed, you need to first obtain a FAS Research Computing account from here.
We have two types of simulations so far:
1) 1 deg x 1 deg offline simulation with ECCO-GODAE assimilated ocean circulation data over a global domain except the Arctic;
2) 36 km x 36 km online simulation with NCEP forcing from the atmosphere over the Arctic.
A third type of simulation with global coverage is also on the way.
These simulations share the same source code, and the differences only exist for the configure file and run directories.
...
See other relevant wiki page for environment set-up scripts.
On odyssey, please always do work like this in an interactive session. To get an interactive session, enter command:
interactive 4 1 4000 240 test
The following might work with any compiler but has only been tested using the GEOS-Chem settings for the Intel comiler version 11, so do the following command:
load_if11
2 Obtain source code
2.1 Obtain GTMM/GEOS-Chem code
For users from the Harvard BGC group, a copy of the source code can be obtained from my home directory:
cp -r /n/home05home02/yxzhang/pub/MITgcm/* thackray/GCgtmm YOUR_SOURCE_DIRECTORY
For users outside this group, we are currently working on a github site.
The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure the model is split into execution environment support code and conventional numerical model code. The execution environment support code is held under the eesupp directory. The grid point model code is held under the model directory. Code execution actually starts in the eesupp routines and not in the model routines. For this reason the top-level MAIN.F is in the eesupp/src directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F. Here is a brief description of the directory structure of the model under the root tree:
doc: contains brief documentation notes.
eesupp: contains the execution environment source code. Also subdivided into two subdirectories inc and src.
model: this directory contains the main source code. Also subdivided into two subdirectories inc and src.
pkg: contains the source code for the packages. Each package corresponds to a subdirectory. For example, gmredi contains the code related to the Gent-McWilliams/Redi scheme, aim the code relative to the atmospheric intermediate physics.
tools: this directory contains various useful tools. For example, genmake2 is a script written in csh (C-shell) that should be used to generate your makefile. The directory adjoint contains the makefile specific to the Tangent linear and Adjoint Compiler (TAMC) that generates the adjoint code. This directory also contains the subdirectory build_options, which contains the `optfiles' with the compiler options for the different compilers and machines that can run MITgcm.
utils: this directory contains various utilities. The subdirectory knudsen2 contains code and a makefile that compute coefficients of the polynomial approximation to the knudsen formula for an ocean nonlinear equation of state. The matlab subdirectory contains matlab scripts for reading model output directly into matlab. scripts contains C-shell post-processing scripts for joining processor-based and tiled-based model output. The subdirectory exch2 contains the code needed for the exch2 package to work with different combinations of domain decompositions.
jobs: contains sample job scripts for running MITgcm.
lsopt: Line search code used for optimization.
optim: Interface between MITgcm and line search code.
3. Compiling process
To compile the code, we use the make program. This uses a file (Makefile) that allows us to pre-process source files, specify compiler and optimization options and also figures out any file dependencies. We supply a script (genmake2) that automatically creates the Makefile for you. You then need to build the dependencies and compile the code.
3.1. make a working directory
A working directory should be made along with the code directory. A dedicated working directory should be made for each simulation you plan to run. In the working directory, you should also make three subfolders:
code: header/option or other files that are often modified
build: where the make program puts the intermediate source code
run: the run directory
The run directory will be introduced further. Before we compiling the code, we need to obtain the content of code directory:
For the ECCO offline global simulation except Arctic Ocean:
Hg simulation: cp /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/code/* YOUR_WORKING_DIR/code/
PFC simulation: cp /n/home05/yxzhang/pub/pfcrun/code/* YOUR_WORKING_DIR/code/
For the online Arctic simulation:
cp /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/code/* YOUR_WORKING_DIR/code/
Note, here I use the Hg simulation as an example. For POPs simulations, the content of this directory is also different. I will post the link later.
3.2. customize the code directory
A list of files in the code directory:
For the ECCO offline global simulation except Arctic Ocean:
- CD_CODE_OPTIONS.h
- CPP_OPTIONS.h
- DIAGNOSTICS_SIZE.h
- GAD_OPTIONS.h
- GCHEM_OPTIONS.h
- GMREDI_OPTIONS.h
- HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
- packages.conf
- PTRACERS_SIZE.h
- SIZE.h
For the online Arctic simulation:
- CPP_OPTIONS.h
- DIAGNOSTICS_SIZE.h
- EXF_OPTIONS.h
- GCHEM_OPTIONS.h
- HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
- OBSC_OPTIONS.h
- packages.conf
- PTRACERS_SIZE.h
- SEAICE_OPTIONS.h
- SIZE.h
We don't need to modify most of these files except the ones I marked bold, such as the SIZE.h and HG_OPTIONS.h (or other) files. In the SIZE.h file, we usually modify sNx, sNy, which mean the numbers of points in subgrid in x and y direction, and nPx, nPy, which mean the number of processors to use in each direction. For ECCO offline global simulation, we need to make sNx * nPx = 360 and sNy * nPy = 160. For online Arctic simulation, these two numbers are 210 and 192. The more processors you declared here, the faster your program will be, despite not linearly. On the FAS machine, the processor number between 100-200 is quite appropriate for the scale of our simulations.
3.3. compiling the code
Load the proper compilers for the code (MPI, intel Fortran etc.):
module load hpc/openmpi-intel-latest
Goto the build directory in your working directory:
cd YOUR_WORKING_DIR/build
First, build the Makefile:
SOURCE_DIRECTORY/tools/genmake2 -mods=../code -of SOURCE_DIRECTORY/tools/build_options/linux_ia64_ifort+mpi_harvard2
The command line option tells genmake to override model source code with any files in the directory ../code/.
Once a Makefile has been generated, we create the dependencies with the command:
make depend
This modifies the Makefile by attaching a (usually, long) list of files upon which other files depend. The purpose of this is to reduce re-compilation if and when you start to modify the code. The make depend command also creates links from the model source to this directory. It is important to note that the make depend stage will occasionally produce warnings or errors since the dependency parsing tool is unable to find all of the necessary header files (eg. netcdf.inc). In these circumstances, it is usually OK to ignore the warnings/errors and proceed to the next step.
Next one can compile the code using:
make
The make command creates an executable called mitgcmuv. Additional make ``targets'' are defined within the makefile to aid in the production of adjoint and other versions of MITgcm. On SMP (shared multi-processor) systems, the build process can often be sped up appreciably using the command:
make -j 8
where the ``8'' can be replaced with a number that corresponds to the number of CPUs available.
This marks the end of compiling process, now move the mitgcmuv file to your run directory:
mv mitgcmuv ../run
4. Run the simulation
4.1 obtain the run directory
For the ECCO offline global simulation except Arctic Ocean:
Hg simulation: cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/run/* YOUR_WORKING_DIR/run/
PFC simulation: cp -r /n/home05/yxzhang/pub/pfcrun/run/* YOUR_WORKING_DIR/run/
A list of files and folders inside of this directory:
- data.gchem: options for running the gchem package
- data.off: options for offline simulation
- data.gmredi: options for gmredi package
- data.pkg: options for using different packages
- data: major control file for your simulation
- data.hg or data.pcb or data.pfc: options/paths for Hg related files
- data.ptracers: options and definition of tracers
- data.cal: options for calendar
- data.kpp: options for kpp package
- data.diagnostics: options for the output of diagnostics
- eedata
- POLY3.COEFFS
- run.hg: this is the run script
- input: contains basic input files
- input_gc: input from GEOS-Chem
- input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/run/* YOUR_WORKING_DIR/run/
A list of files and folders inside of this directory:
- data: major control file for your simulation
- data.cal: options for calendar
- data.diagnostics: options for the output of diagnostics
- data.gchem: options for running the gchem package
- data.gmredi: options for gmredi package
- data.hg or data.pcb or data.pfc: options/paths for Hg related files
data.kpp: options for kpp package
data.obcs: options for open boundary conditions- data.pkg: options for using different packages
- data.ptracers: options and definition of tracers
- data.salt_plume
- data.seaice
- DXC.bin
- DXF.bin
- DXG.bin
- DXV.bin
- DYC.bin
- DYF.bin
- DYG.bin
- DYU.bin
- eedata
- LATC.bin
- LATG.bin
- LONC.bin
- LONG.bin
- RA.bin
- RAS.bin
- RAW.bin
- RAZ.bin
- run.arctic.hg: this is the run script
- input_hg: input for Hg related fields
- input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
- obcs: boundary conditions
4.2 Ocean circulation and forcing files
For the ECCO offline simulation, we need to specify the path to the offline ocean circulation data in the data.off file, please specify the path as:
/n/home05/yxzhang/scratch/offline
For the online Arctic simulation, we also need to specify the atmospheric forcing files in data.exf
/n/home05/yxzhang/scratch/input.arctic
These files are pretty large, so we don't need to keep multiple copies of them.
4.3 Boundary conditions
As the online Arctic simulation is a regional model, so it requires boundary conditions over the border of the model domain. The boundary condition is already prepared in the obcs directory. However, you need to prepare your own boundary conditions for your simulation. The code I used for such aim can be downloaded from:
/n/home05/yxzhang/pub/obcs/
4.4 submit job
In your run directory, you can submit your job by using the job script provided:
sbatch run.hg or sbatch run.arctic.hg
We don't need to modify these script files except for the number of processors you declared, which should be equal to the number you specified in the SIZE.h file mentioned in section 3.2.
A quick guide for submitting and managing jobs on Odyssey is available here.Github link to follow when public version is complete.
2.2 Obtain GTMM data
Again, you can copy my version:
cp -r /n/home02/thackray/GTMM YOUR_GTMM_DIRECTORY
This also contains a run directory for your pre-GEOS-Chem spin up.
3 Edit and compile GTMM code
3.1 Edit GTMM code
Edit the file YOUR_SOURCE_DIRECTORY/GTMM/defineConstants.F90 to point "filepath" and "outputpath" to the
relevant directories in YOUR_GTMM_DIRECTORY.
Note** you must also change "f_len" and "f_len_output" to match your defined paths (this silliness will be removed in the future).
3.2 Compile GTMM code
For now, GTMM must be compiled before GEOS-Chem to make its library available for the GEOS-Chem code. In the future, this will be built into the GEOS-Chem build sequence.
cd YOUR_SOURCE_DIRECTORY/GTMM
bash compile.sh
cp gtmm YOUR_GTMM_DIRECTORY/output/
3.3 Edit GEOS-Chem code
Until the GEOS-Chem build sequence is updated, edit the file YOUR_SOURCE_DIRECTORY/Makefile_header.mk
change line 896 to read:
GTMM_NEEDED :=1
3.4 Compile GEOS-Chem code
At this point the best way to proceed is to make a 4x5 degree Hg run directory using the GEOS-Chem unit tester.
(If you'd like an easier option, you can also do:
cp -r /n/home02/thackray/gtmmrun YOUR_RUN_DIRECTORY
)
Once this is done, edit input.geos so that line 104 reads:
Use GTMM soil model? : T
Now you are ready to compile GEOS-Chem in this run directory:
cd YOUR_RUN_DIRECTORY
make -j4 mpbuild
Note** if you did not use the unit tester to make your run directory, you will have to edit the path that "CODE_DIR" points to in YOUR_RUN_DIRECTORY/Makefile
If compilation completes normally, an executable "geos.mp" will be created.
4. Run the simulation
A quick guide for submitting and managing jobs on Odyssey is available here.
4.1 Run GTMM standalone (optional?)
If you need to do a spin up of GTMM uncoupled from GEOS-Chem, you can do this:
cd YOUR_GTMM_DIRECTORY/output
Edit rungtmm.sh to point to your directory
sbatch rungtmm.sh
When it's done, you can do the following to make GEOS-Chem happy:
cp HgPools HgPools.0
4.2 Run GEOS-Chem coupled to GTMM
An example runscript is available at /n/home02/thackray/gtmmrun/dorun.sh
Edit this run script to use YOUR_RUN_DIRECTORY instead of the one defined by me.
You can submit your job by doing:
sbatch dorun.sh
Sit back and wait for output!