Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1. General introduction

This wiki introduces the processes to run the MIT General Circulation Global Terrestrial Mercury Model (MITgcm) Hg/POPs simulations on the Harvard FAS Odyssey system. General informations regarding MITgcm can be obtained from the MITgcm users manual. GTMM) coupled to the GEOS-Chem Hg simulation. Here I only introduce the content that is immediately related to set up a Hg/POPs simulation on the Harvard machine, but the process should be similar for other machine and computing environmentsas this is a work in progress.

To proceed, you need to first obtain a FAS Research Computing account from here.

We have two types of simulations so far:

1) 1 deg x 1 deg offline simulation with ECCO-GODAE assimilated ocean circulation data over a global domain except the Arctic;

2) 36 km x 36 km online simulation with NCEP forcing from the atmosphere over the Arctic.

A third type of simulation with global coverage is also on the way.

These simulations share the same source code, and the differences only exist for the configure file and run directories.

...

See other relevant wiki page for environment set-up scripts.

2.1 Obtain source code

For users from the Harvard BGC group, a copy of the source code can be obtained from my home directory:

cp -r /n/home05home02/yxzhang/pub/MITgcm/* thackray/GCgtmm YOUR_SOURCE_DIRECTORY

For users outside this group, we are currently working on a github site.

The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure the model is split into execution environment support code and conventional numerical model code. The execution environment support code is held under the eesupp directory. The grid point model code is held under the model directory. Code execution actually starts in the eesupp routines and not in the model routines. For this reason the top-level MAIN.F is in the eesupp/src directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F. Here is a brief description of the directory structure of the model under the root tree: 

doc: contains brief documentation notes.
eesupp: contains the execution environment source code. Also subdivided into two subdirectories inc and src.
model: this directory contains the main source code. Also subdivided into two subdirectories inc and src.
pkg: contains the source code for the packages. Each package corresponds to a subdirectory. For example, gmredi contains the code related to the Gent-McWilliams/Redi scheme, aim the code relative to the atmospheric intermediate physics.
tools: this directory contains various useful tools. For example, genmake2 is a script written in csh (C-shell) that should be used to generate your makefile. The directory adjoint contains the makefile specific to the Tangent linear and Adjoint Compiler (TAMC) that generates the adjoint code. This directory also contains the subdirectory build_options, which contains the `optfiles' with the compiler options for the different compilers and machines that can run MITgcm.
utils: this directory contains various utilities. The subdirectory knudsen2 contains code and a makefile that compute coefficients of the polynomial approximation to the knudsen formula for an ocean nonlinear equation of state. The matlab subdirectory contains matlab scripts for reading model output directly into matlab. scripts contains C-shell post-processing scripts for joining processor-based and tiled-based model output. The subdirectory exch2 contains the code needed for the exch2 package to work with different combinations of domain decompositions.
jobs: contains sample job scripts for running MITgcm.
lsopt: Line search code used for optimization.
optim: Interface between MITgcm and line search code.

3. Compiling process

To compile the code, we use the make program. This uses a file (Makefile) that allows us to pre-process source files, specify compiler and optimization options and also figures out any file dependencies. We supply a script (genmake2) that automatically creates the Makefile for you. You then need to build the dependencies and compile the code.

3.1. make a working directory

A working directory should be made along with the code directory. A dedicated working directory should be made for each simulation you plan to run. In the working directory, you should also make three subfolders:

code: header/option or other files that are often modified

build: where the make program puts the intermediate source code

run: the run directory

The run directory will be introduced further. Before we compiling the code, we need to obtain the content of code directory:

For the ECCO offline global simulation except Arctic Ocean:

Hg simulation: cp /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/code/* YOUR_WORKING_DIR/code/

PFC simulation: cp /n/home05/yxzhang/pub/pfcrun/code/* YOUR_WORKING_DIR/code/

 

For the online Arctic simulation:

cp /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/code/* YOUR_WORKING_DIR/code/

Note, here I use the Hg simulation as an example. For POPs simulations, the content of this directory is also different. I will post the link later.

3.2. customize the code directory

A list of files in the code directory:

For the ECCO offline global simulation except Arctic Ocean:

  • CD_CODE_OPTIONS.h
  • CPP_OPTIONS.h
  • DIAGNOSTICS_SIZE.h
  • GAD_OPTIONS.h
  • GCHEM_OPTIONS.h
  • GMREDI_OPTIONS.h
  • HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
  • packages.conf
  • PTRACERS_SIZE.h
  • SIZE.h

For the online Arctic simulation:

  • CPP_OPTIONS.h
  • DIAGNOSTICS_SIZE.h
  • EXF_OPTIONS.h
  • GCHEM_OPTIONS.h
  • HG_OPTIONS.h (or PCB_OPTIONS.h and PFOS_OPTIONS.h depending on the simulation type)
  • OBSC_OPTIONS.h
  • packages.conf
  • PTRACERS_SIZE.h
  • SEAICE_OPTIONS.h
  • SIZE.h

We don't need to modify most of these files except the ones I marked bold, such as the SIZE.h and HG_OPTIONS.h (or other) files. In the SIZE.h file, we usually modify sNx, sNy, which mean the numbers of points in subgrid in x and y direction, and nPx, nPy, which mean the number of processors to use in each direction. For ECCO offline global simulation, we need to make sNx * nPx = 360 and sNy * nPy = 160. For online Arctic simulation, these two numbers are 210 and 192. The more processors you declared here, the faster your program will be, despite not linearly. On the FAS machine, the processor number between 100-200 is quite appropriate for the scale of our simulations.

3.3. compiling the code

Load the proper compilers for the code (MPI, intel Fortran etc.):

module load hpc/openmpi-intel-latest

Goto the build directory in your working directory:

cd YOUR_WORKING_DIR/build

First, build the Makefile:

SOURCE_DIRECTORY/tools/genmake2 -mods=../code -of SOURCE_DIRECTORY/tools/build_options/linux_ia64_ifort+mpi_harvard2

The command line option tells genmake to override model source code with any files in the directory ../code/.

Once a Makefile has been generated, we create the dependencies with the command:

make depend

This modifies the Makefile by attaching a (usually, long) list of files upon which other files depend. The purpose of this is to reduce re-compilation if and when you start to modify the code. The make depend command also creates links from the model source to this directory. It is important to note that the make depend stage will occasionally produce warnings or errors since the dependency parsing tool is unable to find all of the necessary header files (eg. netcdf.inc). In these circumstances, it is usually OK to ignore the warnings/errors and proceed to the next step.
Next one can compile the code using:

make

The make command creates an executable called mitgcmuv. Additional make ``targets'' are defined within the makefile to aid in the production of adjoint and other versions of MITgcm. On SMP (shared multi-processor) systems, the build process can often be sped up appreciably using the command:

make -j 8

where the ``8'' can be replaced with a number that corresponds to the number of CPUs available.

This marks the end of compiling process, now move the mitgcmuv file to your run directory:

mv mitgcmuv ../run

4. Run the simulation

4.1 obtain the run directory

For the ECCO offline global simulation except Arctic Ocean:

Hg simulation: cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.ecco-godge/run/* YOUR_WORKING_DIR/run/

PFC simulation: cp -r /n/home05/yxzhang/pub/pfcrun/run/* YOUR_WORKING_DIR/run/

A list of files and folders inside of this directory:

  • data.gchem: options for running the gchem package 
  • data.off: options for offline simulation
  • data.gmredi: options for gmredi package
  • data.pkg: options for using different packages
  • data: major control file for your simulation     
  • data.hg or data.pcb or data.pfc: options/paths for Hg related files 
  • data.ptracers: options and definition of tracers        
  • data.cal: options for calendar    
  • data.kpp: options for kpp package 
  • data.diagnostics: options for the output of diagnostics   
  • eedata
  • POLY3.COEFFS
  • run.hg: this is the run script
  • input: contains basic input files
  • input_gc: input from GEOS-Chem
  • input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
 
For the online Arctic simulation:

cp -r /n/home05/yxzhang/MITgcm/myinorghgrun.arctic/run/* YOUR_WORKING_DIR/run/

A list of files and folders inside of this directory:

  • data: major control file for your simulation
  • data.cal: options for calendar 
  • data.diagnostics: options for the output of diagnostics 
  • data.gchem: options for running the gchem package 
  • data.gmredi: options for gmredi package
  • data.hg or data.pcb or data.pfc: options/paths for Hg related files 
  • data.kpp: options for kpp package 

  • data.obcs: options for open boundary conditions
  • data.pkg: options for using different packages
  • data.ptracers: options and definition of tracers
  • data.salt_plume
  • data.seaice
  • DXC.bin
  • DXF.bin
  • DXG.bin
  • DXV.bin
  • DYC.bin
  • DYF.bin
  • DYG.bin
  • DYU.bin
  • eedata
  • LATC.bin
  • LATG.bin
  • LONC.bin
  • LONG.bin
  • RA.bin
  • RAS.bin
  • RAW.bin
  • RAZ.bin
  • run.arctic.hg: this is the run script
  • input_hg: input for Hg related fields
  • input_darwin: input from the DARWIN model output, including DOC, POC concentrations and fluxes etc.
  • obcs: boundary conditions
Again, we don't need to modify most of these files except the ones I marked bold. The options inside these files are pretty strait-forward and self-explaining. A detailed description of these files can be read here

4.2 Ocean circulation and forcing files

For the ECCO offline simulation, we need to specify the path to the offline ocean circulation data in the data.off file, please specify the path as:

/n/home05/yxzhang/scratch/offline

For the online Arctic simulation, we also need to specify the atmospheric forcing files in data.exf

/n/home05/yxzhang/scratch/input.arctic

These files are pretty large, so we don't need to keep multiple copies of them.

4.3 Boundary conditions

As the online Arctic simulation is a regional model, so it requires boundary conditions over the border of the model domain. The boundary condition is already prepared in the obcs directory. However, you need to prepare your own boundary conditions for your simulation. The code I used for such aim can be downloaded from:

/n/home05/yxzhang/pub/obcs/

4.4 submit job

In your run directory, you can submit your job by using the job script provided:

sbatch run.hg or sbatch run.arctic.hg

We don't need to modify these script files except for the number of processors you declared, which should be equal to the number you specified in the SIZE.h file mentioned in section 3.2.

A quick guide for submitting and managing jobs on Odyssey is available here.Github link to follow when public version is complete.

 

2.2 Obtain GTMM data

Again, you can copy my version:

cp -r /n/home02/thackray/GTMM YOUR_GTMM_DIRECTORY

This also contains a run directory for your pre-GEOS-Chem spin up.

3 Edit and compile GTMM code

3.1 Edit GTMM code

Edit the file YOUR_SOURCE_DIRECTORY/GTMM/defineConstants.F90 to point "filepath" and "outputpath" to the

relevant directories in YOUR_GTMM_DIRECTORY.

Note** you must also change "f_len" and "f_len_output" to match your defined paths (this silliness will be removed in the future).

3.2 Compile GTMM code

For now, GTMM must be compiled before GEOS-Chem to make its library available for the GEOS-Chem code. In the future, this will be built into the GEOS-Chem build sequence.

cd YOUR_SOURCE_DIRECTORY/GTMM

bash compile.sh

cp gtmm YOUR_GTMM_DIRECTORY/output/

3.3 Edit GEOS-Chem code

Until the GEOS-Chem build sequence is updated, edit the file YOUR_SOURCE_DIRECTORY/Makefile_header.mk

change line 896 to read:

GTMM_NEEDED             :=1

3.4 Compile GEOS-Chem code

At this point the best way to proceed is to make a 4x5 degree Hg run directory using the GEOS-Chem unit tester.

(If you'd like an easier option, you can also do:

cp -r /n/home02/thackray/gtmmrun YOUR_RUN_DIRECTORY

)

Once this is done, edit input.geos so that line 104 reads:

Use GTMM soil model?    : T

Now you are ready to compile GEOS-Chem in this run directory:

cd YOUR_RUN_DIRECTORY

make -j4 mpbuild

If compilation completes normally, an executable "geos.mp" will be created.

4. Run the simulation

A quick guide for submitting and managing jobs on Odyssey is available here.

4.1 Run GTMM standalone (optional?)

If you need to do a spin up of GTMM uncoupled from GEOS-Chem, you can do this:

cd YOUR_GTMM_DIRECTORY/output

Edit rungtmm.sh to point to your directory

sbatch rungtmm.sh

When it's done, you can do the following to make GEOS-Chem happy:

cp HgPools HgPools.0

4.2 Run GEOS-Chem coupled to GTMM

An example runscript is available at /n/home02/thackray/gtmmrun/dorun.sh

Edit this run script to use YOUR_RUN_DIRECTORY instead of the one defined by me.

You can submit your job by doing:

sbatch dorun.sh

Sit back and wait for output!