Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
#!/bin/bash 
#SBATCH -n 128
#SBATCH -N 2
#SBATCH -t 60
#SBATCH -p regal
#SBATCH --mem-per-cpu=3750
#SBATCH --mail-type=ALL 
#EOC
#------------------------------------------------------------------------------
#                  GEOS-Chem Global Chemical Transport Model                  !
#------------------------------------------------------------------------------
#BOP
#
# !IROUTINE: run.mitgcm.96np.debug
#
# !DESCRIPTION: Script to run a debug MITgcm Hg simulation with 96 CPUs.
#\\
#\\
# !CALLING SEQUENCE:
#  sbatch run.mitgcm.96np.debug   # To submit a batch job
#
# !REMARKS:
#  Consider requesting 2 entire nodes (-n 128 -N 2), which will prevent
#  outside jobs from slowing down your simulation.
#
#  Also note: Make your timestep edits in "data.debug_run", which will
#  automatically be copied to "data" by this script.
#
# !REVISION HISTORY:
#  17 Feb 2015 - R. Yantosca - Initial version
#EOP
#------------------------------------------------------------------------------
#BOC

# Make sure we apply the .bashrc_mitgcm settings
source ~/.bashrc_mitgcm

# Copy run-time parameter input files for the 96 CPU run
cp -f data.debug.run   data
cp -f data.exch2.96np  data.exch2

# Remove old output files
rm -f STDOUT.*
rm -f STDERR.*
rm -f PTRACER*

# Run MITgcm with 96 CPUs
time -p ( mpirun -np 96  ./mitgcmuv )
exit 0
#EOC

 

The run.mitgcm* scripts all do the following things:

  1. Gets the proper compiler and library settings from your ~/.bashrc_mitgcm file.

  2. Reserves CPUs for the MITgcm run.

    • NOTE: For MITgcm production runs, we recommend that you request 128 CPUs (i.e. 2 entire nodes) even though the MITgcm only uses 96.  This will reserve both nodes exclusively for your MITgcm simulation, and will prevent other Odyssey jobs from running on the same node and competing for resources.

  3. Creates the proper data file for your simulation from a template.  This file contains basic information for the simulation, including

    • The number of timesteps for the simulation to run;
    • How frequently diagnostics are saved to disk (i.e. dumpFreq);
    • How frequenlty statistics are written to the log file (i.e. monitorFreq)

  4. Creates the proper data.exch file for your simulation from a template.

    • The data.exch file, which is described below, contains information about the tiles used for the horizontal grid specification.

  5. Runs the MITgcm simulation and prints the user, cpu, and system time in seconds.

The data.exch2

...

files

The data.exch2 .13np contains the following namelist data declaration. This is used to set up the horizontal grid for 13 CPus.

...

input file specifies tiling information for the number of CPUs used.  This tells the MITgcm to place a certain number of grid boxes on each CPU.  For your convenience, we have created two  separate data.exch files, one that can be used with 13 CPU simulations (data.exch2.13np) and one for 96 CPU simulations (data.exch2.96np).  The run.mitgcm script that you use will select the proper data.exch file for your simulation.

data.exch2.13np

The data.exch2.13np contains the following namelist data declaration. This is used to set up the horizontal grid for 13 CPus.

Code Block
 &W2_EXCH2_PARM01
  W2_printMsg          = 0                                                    ,
  W2_mapIO             = 1                                                    ,
  preDefTopol          = 0                                                    ,
#==============================================================================
#-- 5 facets llc_120 topology (drop facet 6 and its connection):
#==============================================================================
  dimsFacets(1:10)     = 90, 270, 90, 270, 90, 90, 270, 90, 270, 90           ,
  facetEdgeLink(1:4,1) = 3.4, 0. , 2.4, 5.1                                   ,
  facetEdgeLink(1:4,2) = 3.2, 0. , 4.2, 1.3                                   ,
  facetEdgeLink(1:4,3) = 5.4, 2.1, 4.4, 1.1                                   ,
  facetEdgeLink(1:4,4) = 5.2, 2.3, 0. , 3.3                                   ,
  facetEdgeLink(1:4,5) = 1.4, 4.1, 0. , 3.1                                   ,
/

...

data.exch2.96np

The data,exch2.96np is used to set up the horizontal grid for 96 CPUs.  It contains the same namelist variables as does data.exch2.13np, with an additional variable named blanklist.  This is used to set certain tiles to zero.

...

For your convenience, we provide several data files that you can use to schedule MITgcm simulations of different lengths.  Each of these files are identical, except for the time stepping parameters:

ParameterUnitsDescription
nIter01

Index of the first iteration. 

This is usually 1 (but you can make it 0 if you want).
nTimeStepsThe number

If these variables are set to nonzero values, the model will look for a ''pickup'' file pickup.0000nIter0 to restart the integration.

nTimeSteps1Number of time steps that you want the MITgcm simulation to run.  You can change the

deltaTClock

sThe model ''clock'' timestep.  This determines the IO frequencies and is used in tagging output.
deltaTmom 
deltaTtracer 
deltaTfreesurf
 

pChkptFreq

 

 

data.debug_run

The data.debug_run file is used to submit a 10-day MITgcm simulation.  The time stepping settings are as follows:

...

sTimestep for momentum equations.  This can be set to the same value as deltaTclock.
deltaTtracersTimestep for tracer equations.  This can be set to the same value as deltaTclock.
deltaTfreesurf
sTimestep for free surface equations.  This can be set to the same value as deltaTclock.

pChkptFreq

s

Control the output frequency (in seconds) of permanent checkpoint files.  See MITgcm manual section 1.5.1.

chkptFreq

sControl the output frequency (in seconds) of rolling checkpoint files.  See MITgcm manual section 1.5.1.
taveFreqs

Controls the frequency (in seconds) of saving time-averaged diagnostic quantities.

dumpFreqsControls the frequency (in seconds) with which the instantaneous state of the model is saved.
monitorFreqs

Sets the the interval between diagnostics written out to the text stdout stream (i.e. to the terminal or the files STDOUT.*). It supplies statistics on model variables (max,mean,etc.) and also checks the CFL values. It can be quite expensive and so should not be done every time-step but perhaps every 10-50 timesteps.

   

data.debug_run

The data.debug_run file is used to submit a 10-hour MITgcm simulation.  The time stepping settings are as follows.

Code Block
 &PARM03                                   
#when using the cd scheme:          
 nIter0                  #epsAB_CD   = 1               = 0.25                              ,
 nTimeSteps           , #tauCD     = 10                = 172800.0,                            ,
#         , # # Set 1-hour timesteps #  deltaTmom                  = 3600.     
 forcing_In_AB              = .FALSE.                    ,  deltaTtracer               = 3600.  ,
 momDissip_In_AB            = .FALSE.                         ,  deltaTfreesurf             =,
3600.#
# Set 1-hour timesteps
#
 deltaTmom                  = 3600.                ,  deltaTClock                = 3600.       ,
 deltaTtracer               = 3600.                 , #                        ,
 deltaTfreesurf             = 3600.  #when using ab2:                                  #abEps    ,
 deltaTClock                = 03600.1                                            ,
#                                         
#when using ab3:                                 
 doAB_onGtGs                = .FALSE.                                        ,
 alph_AB                    = 0.5                                            ,
 beta_AB                    = 0.281105                                       ,
#
# Time averaging and dumping parameters
#                                         
 pChkptFreq                 = 315576000.0                                    ,
 chkptFreq                  = 315576000.0                                    ,
 taveFreq                   = 360000.0                                         ,
 dumpFreq                   = 360000.0                                         ,
 monitorFreq                = 360000.0                                       ,
 dumpInitAndLast            = .TRUE.                                         ,
 adjDumpFreq                = 3155760000.0                                   ,
 adjMonitorFreq             = 360000.0                                       ,
 pickupStrictlyMatch        = .FALSE.                                        ,
# pickupSuff                ='0000166548'                                    ,  
/ 

As you can see, we set the basic timestep (deltaTclock) to 3600 seconds = 1 hour, and then run for 10 timesteps = 10 hours total.  Diagnostics (taveFreq, dumpFreq, monitorFreq) are saved out at the end of the run (after 360000 seconds). 

 

 

 

Debug run

To submit a debugging run (on 13 CPUs), type the following commands:

...