Compiling Source code

This section provides some examples of compiling source code on Anvil.

Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one processor core. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load aocc

The following table illustrates how to compile your serial program:
Language	Intel Compiler	GNU Compiler	AOCC Compiler
Fortran 77	`$ ifort myprogram.f -o myprogram`	`$ gfortran myprogram.f -o myprogram`	`$ flang program.f -o program`
Fortran 90	`$ ifort myprogram.f90 -o myprogram`	`$ gfortran myprogram.f90 -o myprogram`	`$ flang program.f90 -o program`
Fortran 95	`$ ifort myprogram.f90 -o myprogram`	`$ gfortran myprogram.f95 -o myprogram`	`$ flang program.f90 -o program`
C	`$ icc myprogram.c -o myprogram`	`$ gcc myprogram.c -o myprogram`	`$ clang program.c -o program`
C++	`$ icc myprogram.cpp -o myprogram`	`$ g++ myprogram.cpp -o myprogram`	`$ clang++ program.C -o program`

The Intel, GNU and AOCC compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95". You may use ".f90" to stand for any Fortran code regardless of version as it is a free-formatted form.

Compiling MPI Programs

OpenMPI, Intel MPI (IMPI) and MVAPICH2 are implementations of the Message-Passing Interface (MPI) standard. Libraries for these MPI implementations and compilers for C, C++, and Fortran are available on Anvil.

MPI programs require including a header file:
Language	Header Files
Fortran 77	`INCLUDE 'mpif.h'`
Fortran 90	`INCLUDE 'mpif.h'`
Fortran 95	`INCLUDE 'mpif.h'`
C	`#include <mpi.h>`
C++	`#include <mpi.h>`

Here are a few sample programs using MPI:

To see the available MPI libraries:

$ module avail openmpi 
$ module avail impi
$ module avail mvapich2

The following table illustrates how to compile your MPI program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language	Intel Compiler with Intel MPI (IMPI)	Intel/GNU/AOCC Compiler with OpenMPI/MVAPICH2
Fortran 77	`$ mpiifort program.f -o program`	`$ mpif77 program.f -o program`
Fortran 90	`$ mpiifort program.f90 -o program`	`$ mpif90 program.f90 -o program`
Fortran 95	`$ mpiifort program.f90 -o program`	`$ mpif90 program.f90 -o program`
C	`$ mpiicc program.c -o program`	`$ mpicc program.c -o program`
C++	`$ mpiicpc program.C -o program`	`$ mpicxx program.C -o program`

Here is some more documentation from other sources on the MPI libraries:

Compiling OpenMP Programs

All compilers installed on Anvil include OpenMP functionality for C, C++, and Fortran. An OpenMP program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over processor cores in a single compute node without the need for MPI communications.

OpenMP programs require including a header file:
Language	Header Files
Fortran 77	`INCLUDE 'omp_lib.h'`
Fortran 90	`use omp_lib`
Fortran 95	`use omp_lib`
C	`#include <omp.h>`
C++	`#include <omp.h>`

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:

omp_loop.c

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load aocc

The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.
Language	Intel Compiler	GNU Compiler	AOCC Compiler
Fortran 77	`$ ifort -qopenmp myprogram.f -o myprogram`	`$ gfortran -fopenmp myprogram.f -o myprogram`	`$ flang -fopenmp program.f -o program`
Fortran 90	`$ ifort -qopenmp myprogram.f90 -o myprogram`	`$ gfortran -fopenmp myprogram.f90 -o myprogram`	`$ flang -fopenmp program.f90 -o program`
Fortran 95	`$ ifort -qopenmp myprogram.f90 -o myprogram`	`$ gfortran -fopenmp myprogram.f90 -o myprogram`	`$ flang -fopenmp program.f90 -o program`
C	`$ icc -qopenmp myprogram.c -o myprogram`	`$ gcc -fopenmp myprogram.c -o myprogram`	`$ clang -fopenmp program.c -o program`
C++	`$ icc -qopenmp myprogram.cpp -o myprogram`	`$ g++ -fopenmp myprogram.cpp -o myprogram`	`$ clang++ -fopenmp program.cpp -o program`

Here is some more documentation from other sources on OpenMP:

Compiling Hybrid Programs

A hybrid program combines both MPI and shared-memory to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI, Intel MPI (IMPI) and MVAPICH2 and compilers which include OpenMP for C, C++, and Fortran are available.

Hybrid programs require including header files:
Language	Header Files
Fortran 77	`INCLUDE 'omp_lib.h' INCLUDE 'mpif.h'`
Fortran 90	`use omp_lib INCLUDE 'mpif.h'`
Fortran 95	`use omp_lib INCLUDE 'mpif.h'`
C	`#include <mpi.h> #include <omp.h>`
C++	`#include <mpi.h> #include <omp.h>`

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

hybrid_loop.c

To see the available MPI libraries:

$ module avail impi
$ module avail openmpi
$ module avail mvapich2

The following tables illustrate how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language	Intel Compiler with Intel MPI (IMPI)	Intel/GNU/AOCC Compiler with OpenMPI/MVAPICH2
Fortran 77	`$ mpiifort -qopenmp myprogram.f -o myprogram`	`$ mpif77 -fopenmp myprogram.f -o myprogram`
Fortran 90	`$ mpiifort -qopenmp myprogram.f90 -o myprogram`	`$ mpif90 -fopenmp myprogram.f90 -o myprogram`
Fortran 95	`$ mpiifort -qopenmp myprogram.f90 -o myprogram`	`$ mpif90 -fopenmp myprogram.f90 -o myprogram`
C	`$ mpiicc -qopenmp myprogram.c -o myprogram`	`$ mpicc -fopenmp myprogram.c -o myprogram`
C++	`$ mpiicpc -qopenmp myprogram.C -o myprogram`	`$ mpicxx -fopenmp myprogram.C -o myprogram`

Compiling NVIDIA GPU Programs

The Anvil cluster contains GPU nodes that support CUDA and OpenCL. See the detailed hardware overview for the specifics on the GPUs in Anvil. This section focuses on using CUDA.

A simple CUDA program has a basic workflow:

Initialize an array on the host (CPU).
Copy array from host memory to GPU memory.
Apply an operation to array on GPU.
Copy array from GPU memory to host memory.

Here is a sample CUDA program:

gpu_hello.cu

Link to section '"modtree/gpu" Recommended Environment' of 'Compiling NVIDIA GPU Programs' "modtree/gpu" Recommended Environment

ModuleTree or modtree helps users to navigate between CPU stack and GPU stack and sets up a default compiler and MPI environment. For Anvil cluster, our team makes a recommendation regarding the cuda version, compiler, and MPI library. This is a proven stable cuda, compiler, and MPI library combination that is recommended if you have no specific requirements. By load the recommended set:

$ module load modtree/gpu
$ module list
# you will have all following modules
Currently Loaded Modules:
  1) gcc/8.4.1   2) numactl/2.0.14   3) zlib/1.2.11   4) openmpi/4.0.6   5) cuda/11.2.2   6) modtree/gpu

Both login and GPU-enabled compute nodes have the CUDA tools and libraries available to compile CUDA programs. For complex compilations, submit an interactive job to get to the GPU-enabled compute nodes. The gpu-debug queue is ideal for this case. To compile a CUDA program, load modtree/gpu, and use nvcc to compile the program:

$ module load modtree/gpu
$ nvcc gpu_hello.cu -o gpu_hello
./gpu_hello
No GPU specified, using first GPUhello, world

The example illustrates only how to copy an array between a CPU and its GPU but does not perform a serious computation.

The following program times three square matrix multiplications on a CPU and on the global and shared memory of a GPU:

mm.cu

$ module load modtree/gpu
$ nvcc mm.cu -o mm
$ ./mm 0
                                                            speedup
                                                            -------
Elapsed time in CPU:                    7810.1 milliseconds
Elapsed time in GPU (global memory):      19.8 milliseconds  393.9
Elapsed time in GPU (shared memory):       9.2 milliseconds  846.8

For best performance, the input array or matrix must be sufficiently large to overcome the overhead in copying the input and output data to and from the GPU.

For more information about NVIDIA, CUDA, and GPUs: