...
The following are changes you can make within your SBATCH job files:
Get emails with job statuses:
#SBATCH --mail-user=your_email@seas.harvard.edu
#SBATCH --mail-type=ALL
There are many partitions on Cannon that you can use. By far, the best way to learn about what partitions you have access to, is to run
spart.
Regardless:seas_compute: 90 compute nodes with a total of 4,488 cores that are available to all SEAS researchers.
seas_gpu: 43 nodes with 195 GPUs and 2,368 cores that are available to all SEAS researchers who need a GPU.
sapphire: 192 nodes with 21,504 cores available to any researcher at Harvard.
gpu: 36 nodes with 144 GPUs available to any researcher at Harvard who needs a GPU.
gpu_requeue: 172 nodes with 762 GPUs available to any researcher at Harvard who needs GPU nodes and needs the jobs to run quicker, but realize that your jobs can be requeued by researchers with higher priorities (i.e. they use Cannon less frequently, have purchased more hardware, etc.).
shared: 300 nodes with a total of 14,400 cores available to any researcher at Harvard who DOES NOT need a GPU.
serial_requeue: 1475 nodes with a total of 90,820 cores available to any researcher who DOES NOT need a GPU and needs the jobs to run quicker, but realize that your jobs can be requeued by researchers with higher priorities (use Cannon less frequently, have purchased more hardware, etc.). If you have longer running jobs, you should see DMTCP below.
bigmem: 4 nodes with nearly a terabyte of memory available and a total of 448 cores to any researcher who does not need a GPU.
Your lab’s partition: Talk with your lab members. You may have one.
To use these partitions, string them together like so:
CPU
#SBATCH --partition=seas_compute,shared,serial_requeue
GPU
#SBATCH --partition=seas_gpu,gpu,gpu_requeue
Declare the number of GPUs that you need:
#SBATCH --gres=gpu:1
Use a specific type of GPU (in this case, the latest available):
#SBATCH --constraint=a100
More constraints are possible:
Network (MPI jobs) holyhdr holyib bosib
GPUs (GPU jobs) a100 (2020-) a40 (2020-) rtx2080ti (2018-2020) v100 (2017-2020) titanv (2017-2018) 1080 (2017-2018) titanx (2016-2017) m40 (2015-2017) k80 (2014-2015) k20m (2012-2014) | Processor intel amd Processor Family icelake (Intel 2019-) cascadelake (Intel 2019-) westmere (Intel 2010-) skylake (Intel 2015-2019) broadwell (Intel 2014-2018) haswell (Intel 2013-) ivybridge (Intel 2012-2015) abudhabi (AMD 2012-2017) | x86 Extensions avx512 (Intel 2016-) interlagos (AMD 2003-2017) fma4 (AMD 2011-2014) avx2 (Intel/AMD 2011-) avx (Intel/AMD 2011-) CUDA Versions cc8.6 cc7.0 cc7.5 cc6.1 cc6.0 cc5.2 cc3.7 cc3.5 |
E.g. If you wanted the latest available processor and wanted the whole node:
#SBATCH --constraint=icelake
#SBATCH -n 1
You can also say what you would prefer, but state both:
#SBATCH --constraint="icelake|cascadelake"
...