RCAC - Knowledge Base: Applications: cuda

cuda

Link to section 'Description' of 'cuda' Description

CUDA is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Link to section 'Versions' of 'cuda' Versions

Scholar: 12.1.0
Gilbreth: 8.0.61, 9.0.176, 10.0.130, 10.2.89, 11.0.3, 11.2.0, 11.7.0, 12.1.1
Anvil: 11.0.3, 11.2.2, 11.4.2, 12.0.1
Gautschi: 12.6.0

Link to section 'Module' of 'cuda' Module

You can load the modules by:

module load cuda

Link to section 'Monitor Activity and Drivers' of 'cuda' Monitor Activity and Drivers

Users can check the available GPUs, their current usage, installed version of the nvidia drivers, and running processes with the command nvidia-smi. The output should look something like this:

User@gilbreth-fe00:~/cuda $ nvidia-smi
Sat May 27 23:26:14 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.48.07    Driver Version: 515.48.07    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A30          Off  | 00000000:21:00.0 Off |                    0 |
| N/A   29C    P0    29W / 165W |  19802MiB / 24576MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     29152      C   python                           9107MiB |
|    0   N/A  N/A     53947      C   ...020.11-py38/GP/bin/python     2611MiB |
|    0   N/A  N/A     71769      C   ...020.11-py38/GP/bin/python     1241MiB |
|    0   N/A  N/A     72821      C   ...8/TorchGPU_env/bin/python     2657MiB |
|    0   N/A  N/A     91986      C   ...2-4/internal/bin/gdesmond      931MiB |
+-----------------------------------------------------------------------------+

We can see that the node gilbreth-fe00 is running driver version 515.48.07 and is compatible with CUDA version 11.7. We do not recommend users to run jobs on front end nodes, but here we can see there are three python processes and one gdesmond process.

Link to section 'Compile a CUDA code' of 'cuda' Compile a CUDA code

The below vectorAdd.cu is modified from the textbook Learn CUDA Programming.

#include<stdio.h>
#include<stdlib.h>

#define N 512

void host_add(int *a, int *b, int *c) {
	for(int idx=0;idx<N;idx++)
		c[idx] = a[idx] + b[idx];
}

//basically just fills the array with index.
void fill_array(int *data) {
	for(int idx=0;idx<N;idx++)
		data[idx] = idx;
}

void print_output(int *a, int *b, int*c) {
	for(int idx=0;idx<N;idx++)
		printf("\n %d + %d  = %d",  a[idx] , b[idx], c[idx]);
}
int main(void) {
	int *a, *b, *c;
	int size = N * sizeof(int);

	// Alloc space for host copies of a, b, c and setup input values
	a = (int *)malloc(size); fill_array(a);
	b = (int *)malloc(size); fill_array(b);
	c = (int *)malloc(size);

	host_add(a,b,c);

	print_output(a,b,c);

	free(a); free(b); free(c);


	return 0;
}

We can compile the CUDA code by the CUDA nvcc compiler:

nvcc -o vector_addition vector_addition.cu

Link to section 'Example job script' of 'cuda' Example job script

#!/bin/bash

#SBATCH -A XXX
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-node=1
#SBATCH --time 1:00:00

module purge
module load gcc/XXX
module load cuda/XXX

#compile the vector_addition.cu file
nvcc -o vector_addition vector_addition.cu

#runs the vector_addition program
./vector_addition