Skip to main content

deepspeed

Link to section 'Description' of 'deepspeed' Description

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Link to section 'Versions' of 'deepspeed' Versions

  • Bell: rocm4.2_ubuntu18.04_py3.6_pytorch_1.8.1
  • Negishi: rocm4.2_ubuntu18.04_py3.6_pytorch_1.8.1

Link to section 'Module' of 'deepspeed' Module

You can load the modules by:

module load rocmcontainers
module load deepspeed

Link to section 'Example job' of 'deepspeed' Example job

Using #!/bin/sh -l as shebang in the slurm job script will cause the failure of some biocontainer modules. Please use #!/bin/bash instead.

To run deepspeed on our clusters:

#!/bin/bash
#SBATCH -A gpu
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -c 8
#SBATCH --gpus-per-node=1
#SBATCH --job-name=deepspeed
#SBATCH --mail-type=FAIL,BEGIN,END
#SBATCH --error=%x-%J-%u.err
#SBATCH --output=%x-%J-%u.out

module --force purge
ml rocmcontainers deepspeed
Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.