Link to section 'Running Tensorflow code in a batch job' of 'Tensorflow Batch Job' Running Tensorflow code in a batch job

Batch jobs allow us to automate model training without human intervention. They are also useful when you need to run a large number of simulations on the clusters. In the example below, we shall run the tensor_hello.py script in a batch job (refer to Tensorflow guide to see the code). We consider two situations: in the first example, we use the ML-Toolkit modules to run tensorflow, while in the second example, we use our custom installation of tensorflow.

Link to section 'Using Ml-Toolkit modules' of 'Tensorflow Batch Job' Using Ml-Toolkit modules

Save the following code as tensor_hello.sub in the same directory where tensor_hello.py is located.

# filename: tensor_hello.sub

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20 
#SBATCH --time=00:05:00
#SBATCH -A standby
#SBATCH -J hello_tensor

module purge

module load learning/conda-5.1.0-py36-cpu
module load ml-toolkit-cpu/tensorflow 
module list

python tensor_hello.py

Link to section 'Using custom tensorflow installation' of 'Tensorflow Batch Job' Using custom tensorflow installation

Save the following code as tensor_hello.sub in the same directory where tensor_hello.py is located.

# filename: tensor_hello.sub

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=20 
#SBATCH --time=00:05:00
#SBATCH -A standby
#SBATCH -J hello_tensor

module purge

module load anaconda/5.1.0-py36
module load use.own
module load conda-env/my_tf_env-py3.6.4 
module list

echo $PYTHONPATH

python tensor_hello.py

Now you can submit the batch job using the sbatch command.

$ sbatch tensor_hello.sub

Once the job finishes, you will find an output (slurm-xxxxx.out). If tensorflow ran successfully, then the output file will contain the message shown below.

Hello, TensorFlow!
Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.