GPU Computing (AP 278)

Odyssey and GPU Computing on Odyssey

Before proceeding further, make yourself familiar with the basics of Odyssey and GPU computing on Odyssey:

https://rc.fas.harvard.edu/resources/odyssey-quickstart-guide/

https://rc.fas.harvard.edu/resources/documentation/gpgpu-computing-on-odyssey/

CUDA

Compiling and running CUDA code

1) Login to a node with a GPU. Use the holyseasgpu partition (for AP 278).

srun --pty --x11=first -p holyseasgpu --mem 4000 -t 0-10:00 --gres=gpu:1 bash

2) To find out the cuda versions available, In a command window, type:

module-query cuda

3) Load one of the available modules (try cuda/9.2.88 or cuda/8.0):

module load cuda/9.2.88

4) Write or obtain a cuda code. Here are example cuda codes from the following excellent reference (go through the reference to understand the difference between the three versions below):

https://devblogs.nvidia.com/even-easier-introduction-cuda/

add.cu

add_block.cu

add_grid.cu

5) Compile the code:

To compile, for example add.cu, in a terminal type:

nvcc add.cu -o add_cuda

6) Run the code in interactive mode (test runs only):

./add_cuda

7) Running batch jobs

Create a script (say runscript.sh) to run the executable by copying and pasting the following lines:

#!/bin/bash
#SBATCH -p holyseasgpu #Partition to submit to 
#SBATCH -n 1 #Number of cores 
#SBATCH --gres=gpu
#SBATCH -t 5 #Runtime in minutes 
#SBATCH --mem-per-cpu=100 #Memory per cpu in MB (see also --mem)
module load cuda/9.2.88-fasrc01
time ./add_cuda

4) Run the script in batch mode with:

sbatch runscript.sh

Links on CUDA (with tutorials and sample CUDA programs)

1) On Odyssey, after you load a cuda module you can access sample programs from:

      $CUDA_HOME/samples

2) https://devblogs.nvidia.com/even-easier-introduction-cuda/

3) https://devblogs.nvidia.com/easy-introduction-cuda-fortran/

4) https://www.pgroup.com/resources/cudafortran.htm

OpenACC

On Odyssey the PGI OpenACC compiler suite is installed in /n/seasfs03/IACS/ap278/pgi/. To make the compilers (pgcc, pgc++, pgf90, etc.)

available in your path, add the following  lines to your .bashrc file (assumes you are using bash, which is the default shell):

export PGI=/n/seasfs03/IACS/ap278/pgi/
export PATH=/n/seasfs03/IACS/ap278/pgi/linux86-64/18.4/bin:$PATH
export MANPATH=$MANPATH:/n/seasfs03/IACS/ap278/pgi/linux86-64/18.4/man
export LM_LICENSE_FILE=/n/seasfs03/IACS/ap278/pgi/license.dat

Once you add these to your ~/.bashrc, to make these take effect, you can do, in a terminal:

source ~/.bashrc

or

. ~/.bashrc
or you can log out and log back in.

Some useful commands

processor information: nvidia-smi (short), pgaccelinfo (long)

performance profiler: pgprof (For more info: https://www.pgroup.com/resources/docs/18.5/pdf/pgi18profug.pdf)

Compiling and running code with OpenACC directives

You need to first compile your code (say code_acc.c or code_acc.f90) containing OpenACC (see below for example programs).

Note that pgcc and pgf90 should be available in your path for this to succeed (see above for instructions).

For c program:

pgcc -acc code_acc.c -Minfo=accel

For fortran program:

pgf90 -acc code_acc.f90 -Minfo=accel
will create an executable with name a.out. The option -Minfo=accel will display useful information on parallelization.

Slurm Script for running the job on odyssey:

#!/bin/bash 
#SBATCH -N 1  #Number of nodes 
#SBATCH -p holyseasgpu  #Partition to submit to 
#SBATCH --ntasks-per-node 2
#SBATCH --gres=gpu:1
#SBATCH -t 5  #Runtime in minutes 
 ./a.out

An OpenACC example

1) Get the sample code (see Ref. 3 and watch the excellent short video tutorial in Ref 3. before working through this tutorial):

git clone https://github.com/parallel-forall/cudacasts
cd cudacasts/ep3-first-openacc-program

or

cp -r /n/seasfs03/IACS/ap278/cudacasts/ep3-first-openacc-program/ .
cd ep3-first-openacc-program

2) Compile "serial" non-acc code:

pgcc laplace2d.c -o a.out_serial

3) Run the "serial" version and time it:

time ./a.out_serial
4) Compile the code with acc-directives:
pgcc -acc laplace_acc.c -o a.out_acc -Minfo=accel

5) Run the acc-executable:

time ./a.out_acc

Links on OpenACC (with tutorials and sample OpenACC programs)

1) OpenACC example programs

    On Odyssey, you can find the OpenACC example programs in:

       /n/seasfs03/IACS/ap278/pgi/linux86-64/2018/examples/OpenACC/

2The following links are very good general references:

    https://devblogs.nvidia.com/parallelforall/openacc-example-part-1/

    https://devblogs.nvidia.com/openacc-example-part-2/

    https://www.pgroup.com/resources/accel.htm?utm_source=nvidia_otk&utm_medium=web_link&utm_term=download

3) Excellent reference:

    https://devblogs.nvidia.com/cudacasts-episode-3-your-first-openacc-program/

   (Contains excellent video tutorials. Recommended: The video "Your First OpenACC Program" (7.5 minutes).)

   For sample (laplace) code:

   https://github.com/parallel-forall/cudacasts

4) Introductory OpenACC tutorial (free, but requires an account):

   https://nvidia.qwiklab.com/quests/3?locale=en

5) https://www.openacc.org/get-started

6) https://www.pgroup.com/resources/docs/18.4/x86/openacc-gs/index.htm

7) https://docs.computecanada.ca/wiki/OpenACC_Tutorial

8)http://web.stanford.edu/class/cme213/files/lectures/Lecture_14_openacc2017.pdf

Copyright © 2024 The President and Fellows of Harvard College * Accessibility * Support * Request Access * Terms of Use