Software

Module System

The Anvil cluster uses Lmod to manage the user environment, so users have access to the necessary software packages and versions to conduct their research activities. The associated module command can be used to load applications and compilers, making the corresponding libraries and environment variables automatically available in the user environment.

Lmod is a hierarchical module system, meaning a module can only be loaded after loading the necessary compilers and MPI libraries that it depends on. This helps avoid conflicting libraries and dependencies being loaded at the same time. A list of all available modules on the system can be found with the module spider command:

$ module spider # list all modules, even those not available due to incompatible with currently loaded modules

-----------------------------------------------------------------------------------
The following is a list of the modules and extensions currently available:
-----------------------------------------------------------------------------------
  amdblis: amdblis/3.0
  amdfftw: amdfftw/3.0
  amdlibflame: amdlibflame/3.0
  amdlibm: amdlibm/3.0
  amdscalapack: amdscalapack/3.0
  anaconda: anaconda/2021.05-py38
  aocc: aocc/3.0

Lines 1-45

The module spider command can also be used to search for specific module names.

$ module spider intel # all modules with names containing 'intel'
-----------------------------------------------------------------------------------
  intel:
-----------------------------------------------------------------------------------
     Versions:
        intel/19.0.5.281
        intel/19.1.3.304
     Other possible modules matches:
        intel-mkl
-----------------------------------------------------------------------------------
$ module spider intel/19.1.3.304 # additional details on a specific module
-----------------------------------------------------------------------------------
  intel: intel/19.1.3.304
-----------------------------------------------------------------------------------

    This module can be loaded directly: module load intel/19.1.3.304

    Help:
      Intel Parallel Studio.

When users log into Anvil, a default compiler (GCC), MPI libraries (OpenMPI), and runtime environments (e.g., Cuda on GPU-nodes) are automatically loaded into the user environment. It is recommended that users explicitly specify which modules and which versions are needed to run their codes in their job scripts via the module load command. Users are advised not to insert module load commands in their bash profiles, as this can cause issues during initialization of certain software (e.g. Thinlinc).

When users load a module, the module system will automatically replace or deactivate modules to ensure the packages you have loaded are compatible with each other. Following example shows that the module system automatically unload the default Intel compiler version to a user-specified version:

$ module load intel # load default version of Intel compiler
$ module list # see currently loaded modules

Currently Loaded Modules:
  1) intel/19.0.5.281

$ module load intel/19.1.3.304 # load a specific version of Intel compiler
$ module list # see currently loaded modules

The following have been reloaded with a version change:
  1) intel/19.0.5.281 => intel/19.1.3.304

Most modules on Anvil include extensive help messages, so users can take advantage of the module help command to find information about a particular application or module. Every module also contains two environment variables named $RCAC_APPNAME_ROOT and $RCAC_APPNAME_VERSION identifying its installation prefix and its version. Users are encouraged to use generic environment variables such as CC, CXX, FC, MPICC, MPICXX etc. available through the compiler and MPI modules while compiling their code.

Link to section 'Some other common module commands:' of 'Module System' Some other common module commands:

To unload a module

$ module unload mymodulename

To unload all loaded modules and reset everything to original state.

$ module purge

To see all available modules that are compatible with current loaded modules

$ module avail

To display information about a specified module, including environment changes, dependencies, software version and path.

$ module show mymodulename

Compiling, performance, and optimization on Anvil

Anvil CPU nodes have GNU, Intel, and AOCC (AMD) compilers available along with multiple MPI implementations ( OpenMPI, Intel MPI (IMPI) and MVAPICH2). Anvil GPU nodes also provide the PGI compiler. Users may want to note the following AMD Milan specific optimization options that can help improve the performance of your code on Anvil:

  1. The majority of the applications on Anvil are built using gcc/10.2.0 which features an AMD Milan specific optimization flag (-march=znver2).

  2. AMD Milan CPUs support the Advanced Vector Extensions 2 (AVX2) vector instructions set. GNU, Intel, and AOCC compilers all have flags to support AVX2. Using AVX2, up to eight floating point operations can be executed per cycle per core, potentially doubling the performance relative to non-AVX2 processors running at the same clock speed.

  3. In order to enable AVX2 support, when compiling your code, use the -march=znver2 flag (for GCC 10.2, Clang and AOCC compilers) or -march=core-avx2 (for Intel compilers and GCC prior to 9.3).

Other Software Usage Notes:

  1. Use the same environment that you compile the code to run your executables. When switching between compilers for different applications, make sure that you load the appropriate modules before running your executables.

  2. Explicitly set the optimization level in your makefiles or compilation scripts. Most well written codes can safely use the highest optimization level (-O3), but many compilers set lower default levels (e.g. GNU compilers use the default -O0, which turns off all optimizations).

  3. Turn off debugging, profiling, and bounds checking when building executables intended for production runs as these can seriously impact performance. These options are all disabled by default. The flag used for bounds checking is compiler dependent, but the debugging (-g) and profiling (-pg) flags tend to be the same for all major compilers.

  4. Some compiler options are the same for all available compilers on Anvil (e.g. "-o"), while others are different. Many options are available in one compiler suite but not the other. For example, Intel, PGI, and GNU compilers use the -qopenmp, -mp, and -fopenmp flags, respectively, for building OpenMP applications.

  5. MPI compiler wrappers (e.g. mpicc, mpif90) all call the appropriate compilers and load the correct MPI libraries depending on the loaded modules. While the same names may be used for different compilers, keep in mind that these are completely independent scripts.

For Python users, Anvil provides two Python distributions: 1) a natively compiled Python module with a small subset of essential numerical libraries which are optimized for the AMD Milan architecture and 2) binaries distributed through Anaconda. Users are recommended to use virtual environments for installing and using additional Python packages.

A broad range of application modules from various science and engineering domains are installed on Anvil, including mathematics and statistical modeling tools, visualization software, computational fluid dynamics codes, molecular modeling packages, and debugging tools.

In addition, Singularity is supported on Anvil and Nvidia GPU Cloud containers are available on Anvil GPU nodes.

Compiling Source code

This section provides some examples of compiling source code on Anvil.

Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one processor core. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load aocc
The following table illustrates how to compile your serial program:
Language Intel Compiler GNU Compiler AOCC Compiler
Fortran 77
$ ifort myprogram.f -o myprogram
$ gfortran myprogram.f -o myprogram
$ flang program.f -o program
Fortran 90
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f90 -o myprogram
$ flang program.f90 -o program
Fortran 95
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f95 -o myprogram
$ flang program.f90 -o program
C
$ icc myprogram.c -o myprogram
$ gcc myprogram.c -o myprogram
$ clang program.c -o program
C++
$ icc myprogram.cpp -o myprogram
$ g++ myprogram.cpp -o myprogram
$ clang++ program.C -o program

The Intel, GNU and AOCC compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95". You may use ".f90" to stand for any Fortran code regardless of version as it is a free-formatted form.

Compiling MPI Programs

OpenMPI, Intel MPI (IMPI) and MVAPICH2 are implementations of the Message-Passing Interface (MPI) standard. Libraries for these MPI implementations and compilers for C, C++, and Fortran are available on Anvil.

MPI programs require including a header file:
Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
INCLUDE 'mpif.h'
Fortran 95
INCLUDE 'mpif.h'
C
#include <mpi.h>
C++
#include <mpi.h>

Here are a few sample programs using MPI:

To see the available MPI libraries:

$ module avail openmpi 
$ module avail impi
$ module avail mvapich2
The following table illustrates how to compile your MPI program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language Intel Compiler with Intel MPI (IMPI) Intel/GNU/AOCC Compiler with OpenMPI/MVAPICH2
Fortran 77
$ mpiifort program.f -o program
$ mpif77 program.f -o program
Fortran 90
$ mpiifort program.f90 -o program
$ mpif90 program.f90 -o program
Fortran 95
$ mpiifort program.f90 -o program
$ mpif90 program.f90 -o program
C
$ mpiicc program.c -o program
$ mpicc program.c -o program
C++
$ mpiicpc program.C -o program
$ mpicxx program.C -o program

The Intel, GNU and AOCC compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95". You may use ".f90" to stand for any Fortran code regardless of version as it is a free-formatted form.

Here is some more documentation from other sources on the MPI libraries:

Compiling OpenMP Programs

All compilers installed on Anvil include OpenMP functionality for C, C++, and Fortran. An OpenMP program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over processor cores in a single compute node without the need for MPI communications.

OpenMP programs require including a header file:
Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
Fortran 90
use omp_lib
Fortran 95
use omp_lib
C
#include <omp.h>
C++
#include <omp.h>

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load aocc
The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.
Language Intel Compiler GNU Compiler AOCC Compiler
Fortran 77
$ ifort -qopenmp myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
$ flang -fopenmp program.f -o program
Fortran 90
$ ifort -qopenmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
$ flang -fopenmp program.f90 -o program
Fortran 95
$ ifort -qopenmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
$ flang -fopenmp program.f90 -o program
C
$ icc -qopenmp myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
$ clang -fopenmp program.c -o program
C++
$ icc -qopenmp myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram
$ clang++ -fopenmp program.cpp -o program

The Intel, GNU and AOCC compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95". You may use ".f90" to stand for any Fortran code regardless of version as it is a free-formatted form.

Here is some more documentation from other sources on OpenMP:

Compiling Hybrid Programs

A hybrid program combines both MPI and shared-memory to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI, Intel MPI (IMPI) and MVAPICH2 and compilers which include OpenMP for C, C++, and Fortran are available.

Hybrid programs require including header files:
Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
INCLUDE 'mpif.h'
Fortran 90
use omp_lib
INCLUDE 'mpif.h'
Fortran 95
use omp_lib
INCLUDE 'mpif.h'
C
#include <mpi.h>
#include <omp.h>
C++
#include <mpi.h>
#include <omp.h>

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

To see the available MPI libraries:

$ module avail impi
$ module avail openmpi
$ module avail mvapich2
The following tables illustrate how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language Intel Compiler with Intel MPI (IMPI) Intel/GNU/AOCC Compiler with OpenMPI/MVAPICH2
Fortran 77
$ mpiifort -qopenmp myprogram.f -o myprogram
$ mpif77 -fopenmp myprogram.f -o myprogram
Fortran 90
$ mpiifort -qopenmp myprogram.f90 -o myprogram
$ mpif90 -fopenmp myprogram.f90 -o myprogram
Fortran 95
$ mpiifort -qopenmp myprogram.f90 -o myprogram
$ mpif90 -fopenmp myprogram.f90 -o myprogram
C
$ mpiicc -qopenmp myprogram.c -o myprogram
$ mpicc -fopenmp myprogram.c -o myprogram
C++
$ mpiicpc -qopenmp myprogram.C -o myprogram
$ mpicxx -fopenmp myprogram.C -o myprogram

The Intel, GNU and AOCC compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95". You may use ".f90" to stand for any Fortran code regardless of version as it is a free-formatted form.

Compiling NVIDIA GPU Programs

The Anvil cluster contains GPU nodes that support CUDA and OpenCL. See the detailed hardware overview for the specifics on the GPUs in Anvil. This section focuses on using CUDA.

A simple CUDA program has a basic workflow:

  • Initialize an array on the host (CPU).
  • Copy array from host memory to GPU memory.
  • Apply an operation to array on GPU.
  • Copy array from GPU memory to host memory.

Here is a sample CUDA program:

ModuleTree or modtree helps users to navigate between CPU stack and GPU stack and sets up a default compiler and MPI environment. For Anvil cluster, our team makes a recommendation regarding the cuda version, compiler, and MPI library. This is a proven stable cuda, compiler, and MPI library combination that is recommended if you have no specific requirements. By load the recommended set:

$ module load modtree/gpu
$ module list
# you will have all following modules
Currently Loaded Modules:
  1) gcc/8.4.1   2) numactl/2.0.14   3) zlib/1.2.11   4) openmpi/4.0.6   5) cuda/11.2.2   6) modtree/gpu

Both login and GPU-enabled compute nodes have the CUDA tools and libraries available to compile CUDA programs. For complex compilations, submit an interactive job to get to the GPU-enabled compute nodes. The gpu-debug queue is ideal for this case. To compile a CUDA program, load modtree/gpu, and use nvcc to compile the program:

$ module load modtree/gpu
$ nvcc gpu_hello.cu -o gpu_hello
./gpu_hello
No GPU specified, using first GPUhello, world

The example illustrates only how to copy an array between a CPU and its GPU but does not perform a serious computation.

The following program times three square matrix multiplications on a CPU and on the global and shared memory of a GPU:

$ module load modtree/gpu
$ nvcc mm.cu -o mm
$ ./mm 0
                                                            speedup
                                                            -------
Elapsed time in CPU:                    7810.1 milliseconds
Elapsed time in GPU (global memory):      19.8 milliseconds  393.9
Elapsed time in GPU (shared memory):       9.2 milliseconds  846.8

For best performance, the input array or matrix must be sufficiently large to overcome the overhead in copying the input and output data to and from the GPU.

For more information about NVIDIA, CUDA, and GPUs:

Provided Software

The Anvil team provides a suite of broadly useful software for users of research computing resources. This suite of software includes compilers, debuggers, visualization libraries, development environments, and other commonly used software libraries. Additionally, some widely-used application software is provided.

ModuleTree or modtree helps users to navigate between CPU stack and GPU stack and sets up a default compiler and MPI environment. For Anvil cluster, our team makes recommendations for both CPU and GPU stack regarding the CUDA version, compiler, math library, and MPI library. This is a proven stable CUDA version, compiler, math, and MPI library combinations that are recommended if you have no specific requirements. To load the recommended set:

$ module load modtree/cpu # for CPU
$ module load modtree/gpu # for GPU

Link to section 'GCC Compiler' of 'Provided Software' GCC Compiler

The GNU Compiler (GCC) is provided via the module command on Anvil clusters and will be maintained at a common version compatible across all clusters. Third-party software built with GCC will use this GCC version, rather than the GCC provided by the operating system vendor. To see available GCC compiler versions available from the module command:

$ module avail gcc

Link to section 'Toolchain' of 'Provided Software' Toolchain

The Anvil team will build and maintain an integrated, tested, and supported toolchain of compilers, MPI libraries, data format libraries, and other common libraries. This toolchain will consist of:

  • Compiler suite (C, C++, Fortran) (Intel, GCC and AOCC)
  • BLAS and LAPACK
  • MPI libraries (OpenMPI, MVAPICH, Intel MPI)
  • FFTW
  • HDF5
  • NetCDF

Each of these software packages will be combined with the stable "modtree/cpu" compiler, the latest available Intel compiler, and the common GCC compiler. The goal of these toolchains is to provide a range of compatible compiler and library suites that can be selected to build a wide variety of applications. At the same time, the number of compiler and library combinations is limited to keep the selection easy to navigate and understand. Generally, the toolchain built with the latest Intel compiler will be updated at major releases of the compiler.

Link to section 'Commonly Used Applications' of 'Provided Software' Commonly Used Applications

The Anvil team will go to every effort to provide a broadly useful set of popular software packages for research cluster users. Software packages such as Matlab, Python (Anaconda), NAMD, GROMACS, R, and others that are useful to a wide range of cluster users are provided via the module command.

Link to section 'Changes to Provided Software' of 'Provided Software' Changes to Provided Software

Changes to available software, such as the introduction of new compilers and libraries or the retirement of older toolchains, will be scheduled in advance and coordinated with system maintenances. This is done to minimize impact and provide a predictable time for changes. Advance notice of changes will be given with regular maintenance announcements and through notices printed through “module load”s. Be sure to check maintenance announcements and job output for any upcoming changes.

Link to section 'Long Term Support' of 'Provided Software' Long Term Support

The Anvil team understands the need for a stable and unchanging suite of compilers and libraries. Research projects are often tied to specific compiler versions throughout their lifetime. The Anvil team will go to every effort to provide the "modtree/cpu" or "modtree/gpu" environment and the common GCC compiler as a long-term supported environment. These suites will stay unchanged for longer periods than the toolchain built with the latest available Intel compiler.

Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.