Table of Contents |
---|
1. General introduction
This wiki outlines the procedures for running the MIT General Circulation Model (MITgcm) Hg/POPs simulations on Odyssey system. General information about the MITgcm can be found in the MITgcm User's Manual.
We have one type of simulation so far:
1) A nominal 1 degree x 1 degree online simulation with ECCOv4 ocean circulation data over a global domain with higher spatial resolution over the Arctic Ocean.
2. Obtain source code
Users from the Harvard BGC group can obtain a copy of the source code from:
/n/sunderland_lab/Lab/MITgcm/
Note: Do NOT copy the verification folder, it takes up huge disk space.
In your ~username home directory make an MITgcm directory and copy all of the folders except verification into your MITgcm directory from the Lab copies. For example:
cd
mkdir MITgcm
cd MITgcm
cp -r /n/sunderland_lab/Lab/MITgcm/bin/ .
cp -r /n/sunderland_lab/Lab/MITgcm/doc/ .
cp -r /n/sunderland_lab/Lab/MITgcm/eesupp/ .
...etc.!
...
Table of Contents |
---|
1. General introduction
This wiki outlines the procedures for running the MIT General Circulation Model (MITgcm) Hg/POPs simulations on Odyssey system. General information about the MITgcm can be found in the MITgcm User's Manual.
We have one type of simulation so far:
1) A nominal 1 degree x 1 degree online simulation with ECCOv4 ocean circulation data over a global domain with higher spatial resolution over the Arctic Ocean.
2. Obtain source code
Users from the Harvard BGC group can obtain a copy of the source code from from the MITgcm_code repository on Bitbucket. See the initial setup instructions at Running the MITgcm PFOS and Hg Simulations on Odyssey for detailed instructions.
The numerical model is contained within a execution environment support wrapper. This wrapper is designed to provide a general framework for grid-point models. MITgcm is a specific numerical model that uses the framework. Under this structure, the model is split into execution environment support code and conventional numerical model code. The execution environment support code is is in the eesupp/ directory. The grid point model code is in the model/ directory. Code execution actually starts in the eesupp/ routines and not in the model routines. For this reason, the top-level MAIN.F is in the eesupp/src/ directory. In general, end-users should not need to worry about this level. The top-level routine for the numerical part of the code is in model/src/THE_MODEL_MAIN.F.
...
- Copy your code package as separate folder in MITgcm_code/pkg/ (e.g., MITgcm_code/pkg/pfos/). If you don't know how to develop such a package, a good template to follow is the hg package in /home/geos_harvard/yanxu/MITgcm/pkg/hg/. Generally, you need to write a series of functions in your package that solve different terms of the continuity equation. The physical transport is handled by the ptracer/ package, so you just need to focus on the source-sink terms of your pollutant(s) of interest. You also need a couple of header files to define a series of variables and some files to handle the disk I/O.
- Hook up your code with the main program via the gchem/ package. You should modify several files, including:
- gchem_calc_tendency.F: from here you can call the functions that solve different biogeochemical processes, e.g. chemistry, surface forcing, partitioning.
- gchem_fields_load.F: from here you can call the function to load input data.
- GCHEM.h: add a trigger to enable your package, such as useHG, usePCB.
...
e. gchem_init_fixed.F: from here you can call the function which handles initializing diagnostics.
You can also refer to my this modification to the gchem/ package at /home/geos_harvard/yanxu/MITgcm/pkg/gchem. Search modifications by grep -i "yxz" *.
...
- Need to comment out all calls to ALLOW_CAL in pcb_fields_load.F
- In gchem_init_fixed.F, you need to make sure you have the line: CALL PCB_PARM. Yanxu got rid of his hg_parms.F file, so a CALL HG_PARM line is missing from his gchem_initi_fixed.F file. The PCB simulation still has a pcb_parms.F file and if it isn't "turned on" by calling it from gchem_init_fixed.F, then your output will be all NaNs.
- Use online wind, ice, and solar radiation information from ECCOv4. In ECCOv1, we read wind, ice, and radiation from an offline file (e.g., archived from MERRA or GEOS-5). Now those variables are generated online. You need to do two things to activate this capability:
- Add "#ifdef USE_EXFIWR" statements to your package. The easiest way to do this is to search "USE_EXFIWR" in the HG code (
/n/sunderland_lab/MITgcm/pkg/hg/) and copy these to your own code.
- After adding the "#ifdef USE_EXFIWR" statements to your package, you need to update the names of your ice, wind, and radiation variables. You probably need to do this if your code has air-sea exchange, ice interactions, or photochemistry. In pcba_surfforcing.F, which handles air-sea exchange, I had to replace wind(i,j,bi,bj)with windo and fIce(i,j,bi,bj) with ice. If you haven't done this properly, your PTRACER output might have big blocks missing, like this:
- Add "#ifdef USE_EXFIWR" statements to your package. The easiest way to do this is to search "USE_EXFIWR" in the HG code (
...
Here we will set up the following directories within your ~username/MITgcm/verification/global_hg_llc90/ directory:
code: Header/option or other files that are often modified.
...
Before compiling the code, you need to obtain the content of code/ directory. Copy all the files in /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/code/ :
cd ~username/MITgcm/verification/global_hg_llc90/
cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/code/ ./
Lastly, make empty build/ and run/ directories within your ~username/MITgcm/verification/global_hg_llc90/ directory:
cd ~username/MITgcm/verification/global_hg_llc90/
mkdir build
mkdir run
If you are running the Hg simulation, you should be all set. If you are running a different simulation (e.g., PCBs or PFOS) and only using Hg as a template, you need to modify:
...
*****Also, a WARNING: the MITgcm Hg simulation has NOT successfully run with this module configuration below on Odyssey. the error is described in section 7, here. This page will be updated as more is understood, but in the meantime please use the modules and optfile listed here, under section (b) ('used by Chris Horvat'), which have been tested and do run on Odyssey. *****
module load hpc/openmpi-intel-latest
module load hpc/netcdf-3.6.3
Then let's go to the build/ directory and build your Makefile:
cd ~username/MITgcm/verification/global_hg_llc90/build
First, build the Makefile. Note: the "-optfile" filename below (and contents) will need to be changed if you have to load different module versions than the specific ones listed above.
make clean Note: this is needed if you change which modules are loaded and/or the optfile
../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_ia64_ifort+mpi_harvard3
...
/n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/
6.1 Copy first
...
batch of files
Copy these folders to ~username/MITgcm/verification/
cd ~username/MITgcm/verification/
cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_oce_cs32/ ./
cp -r /n/sunderland_lab/Lab/MITgcm/verification/global_oce_input_fields/ ./
...
cd ~username/MITgcm/verification/global_hg_llc90/run
ln -s /n/sunderland_lab/Lab/eccov4_input/controls/* .
ln -s /n/sunderland_lab/Lab/eccov4_input/MITprof/* .
ln -s /n/sunderland_lab/Lab/eccov4_input/pickups/* .
ln -s /n/sunderland_lab/Lab/eccov4_input/era-interim/* .
...
Go back to your run directory and make another directory called another directory called input_hg/ for Hg deposition input from the atmosphere:
cd .. (to get back to your run directory, assuming you were in run/intial/)
mkdir input_hg
Now fill it with your input files. If you do not have any, use the input files from sunderland_lab:
ln -s /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_hg/
...
cd .. (to get back to your run directory, assuming you were in run/intial/)
mkdir input_hg
Now fill it with your input files. If you do not have any, use the input files from sunderland_lab:
ln -s * input_hg/.
If you are running with the food web model (which is the default setting when you copy the code/ directory from lab_sunderland) (see ~username/MITgcm/verification/global_hg_llc90/code/HG_OPTIONS.h and look if this is set to "define"), you will need to get plankton inputs.
Still in your run directory:
mkdir input_darwin
ln -s /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_hgdarwin/* input_hg/.
...
darwin/.
6.5 Control files
Still in your run/ directory, make a control/ subdirectory:
mkdir control
Move all the control files into this folder
mv xx_* control/
cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/
...
control/
...
Still in your run directory:
mkdir input_darwin
ln -s wt_* control/
6.6 data* files
If you're running an Hg simulation, copy data* files to your run/ directory from here:
cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/input_darwin/* input_darwin/.
6.5 Control files
Still in your run/ directory, make a control/ subdirectory:
mkdir control
Move all the control files into this folder
mv xx_* control/_llc90/run/data* .
Note: within the file called "data" are variables to set how long of a run you want to do (in # of timesteps) and to set the length of a timestep (in seconds).
If you're running the PCB and DARWIN simulations, copy data* files to your run/ directory from Svante. If you don't know how, Helen Amos might help you this. The bottom line is that you can not reuse the old files from the older ECCO version simulations.
6.7 Submit job
Copy the submit script into run/, and modify it to any name you like:
cp /n/sunderland_lab/Lab/MITgcm/verification/global_hg_llc90/run/control/wt_* control/
6.6 data* files
If you're running an Hg simulation, copy data* files to your run/ directory from here:
cp /n/sunderland_lab/Labrun.mehg .
Then we can submit the job to the queue. To submit:
sbatch YOUR_RUN_SCRIPT
If your run finishes without any problems, the very last line of your STDOUT.0000 file should indicate the model 'ENDED NORMALLY'.
An example run script for running an 8-hour test run with 1-hour timesteps is located here:
/n/home09/hmh/MITgcm/verification/global_hg_llc90/run/data* .
Note: within the file called "data" are variables to set how long of a run you want to do (in # of timesteps) and to set the length of a timestep (in seconds).
If you're running the PCB and DARWIN simulations, copy data* files to your run/ directory from Svante. If you don't know how, Helen Amos might help you this. The bottom line is that you can not reuse the old files from the older ECCO version simulations.
6.7 Submit job
Copy the submit script into run/, and modify it to any name you like:
cp /n/sunderland_lab/Lab_llc90/run_original_fixed/run.8hr.testrun.chrismodules.96core
You may need more than 1 hour for the run to complete, 120 minutes is conservative. This is for the data file configuration located here that sets the timesteps and length to 8 hours:
/n/home09/hmh/MITgcm/verification/global_hg_llc90/run/run.mehg .
Then we can submit the job to the queue. To submit:
sbatch YOUR_RUN_SCRIPT
If your run finishes without any problems, the very last line of your STDOUT.0000 file should indicate the model 'ENDED NORMALLY'
_original_fixed/data
6.8 How to check on your run during and after completion
...
sacct -j JOBID --format=JobID,JobName,ReqMem,MaxRSS,Elapsed
for on To learn how how to use the output from the above see https://rc.fas.harvard.edu/resources/odyssey-quickstart-guide/ section "A note on requesting memory".
6.9 Debugging tips
...
1. If your run crashes, check the following files for error messages:
...
Documentation for ECCOv4, the physical model configuration of our simulations [pdf], and the associated publication [pdf].
Processing model output and regridding input files involves the gcmfaces/ package. Documentation for gcmfaces/ is available here [pdf].
Special thanks to Gael Forget, Stephanie Dutkiewicz and Jeffery Scott at MIT.
...
This is an issue with the KPP chemical solver not converging and requires a change in how the code is compiled compilation. More will be updated later, but for now follow instructions for loading modules here (section (b)-Chris Horvat) using the optfile available for download here.
...
You should keep all your optfiles within your directory: ~username/MITgcm/tools/build_options/ .
1. Old module system:
a. Standard, as written in instructions above. optfile is already in your build_options directory if you follow the instructions to copy folders from sunderland_lab
module load hpc/openmpi-intel-latest
module load hpc/netcdf-3.6.3
../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_ia64_ifort+mpi_harvard3
Note: this will load the following versions: intel compiler 13.0.079; openmpi 1.6.2.
b. used by Chris Horvat. Download optfile by clicking here. use scp to copy this file to Odyssey, then mv into build_options directory. OR - you can copy from /n/home09/hmh/MITgcm/tools/build_options/linux_amd64_ifort_mpi_odyssey2
module load centos6/openmpi-1.7.2_intel-13.0.079
module load centos6/netcdf-4.3.0_intel-13.0.079
../../../tools/genmake2 -mods=../code -optfile=../../../tools/build_options/linux_amd64_ifort_mpi_odyssey2 -mpi -enable=mnc
2. New module system:
Download optfile by clicking here.
...
1.Load Lmod, Odyssey's new module system. at the command line, in any directory, enter:
source new-modules.sh
2. Load intel compiler:
module load intel/13.0.079-fasrc01
3. Find out which modules are compatible with this intel version:
module avail
Right now, the list looks something like this:
openmpi/1.6.5-fasrc01
openmpi/1.8.1-fasrc01
openmpi/1.8.3-fasrc01
netcdf/3.6.3-fasrc01
This means you can choose any of the 3 openmpi versions, but there is only one compatible netCDF version.
4. Load your openmpi module of choice and netCDF module. As an example, here we'll choose openmpi 1.6.5.
module load openmpi/1.6.5-fasrc01
module load netcdf/3.6.3-fasrc01
5. Find out what the filepaths are for these modules:
printenv
Now look for "LD_LIBRARY_PATH" and "CPATH" (search within the terminal window). For the modules above, it should look something like this:
LD_LIBRARY_PATH=/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/netcdf/3.6.3-fasrc01/lib64:/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/lib:/n/sw/intel_cluster_studio-2013/lib/intel64:/lsf/7.0/linux2.6-glibc2.3-x86_64/lib
CPATH=/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/netcdf/3.6.3-fasrc01/include:/n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/include:/n/sw/intel_cluster_studio-2013/composerxe/include/intel64:/n/sw/intel_cluster_studio-2013/composerxe/include
6. Create a new optfile, by making a copy of a previous one, within your ~username/MITgcm/tools/build_options/ directory.
cd ~username/MITgcm/tools/build_options/
cp linux_ia64_ifort+mpi_harvard3 linux_ia64_ifort+mpi_harvard_test (just an example, can change filename to whatever you want)
7. Open the file you've just copied (e.g., with emacs, nano, vi, or whatever text editor), and look for the following lines, which you will want to edit (Note, they may be slightly different, this is an example):
INCLUDES='-I/n/sw/openmpi-1.6.2_intel-13.0.079/include -I/n/sw/intel_cluster_studio-2013/mkl/include'
...
cd /n/sw/fasrcsw/apps/Comp/intel/13.0.079-fasrc01/openmpi/1.6.5-fasrc01/include
5. More information on Odyssey modules & useful commands:
https://rc.fas.harvard.edu/resources/documentation/software-on-odyssey/modules/
module purge - clears all loaded modules
module list - shows currently loaded modules