Skip to main content

Python

Notice: Python 2.7 has reached end-of-life on Jan 1, 2020 (announcement). Please update your codes and your job scripts to use Python 3.

Python is a high-level, general-purpose, interpreted, dynamic programming language. We suggest using Anaconda which is a Python distribution made for large-scale data processing, predictive analytics, and scientific computing. For example, to use the default Anaconda distribution:

$ module load anaconda

For a full list of available Anaconda and Python modules enter:

$ module spider anaconda

Example Python Jobs

This section illustrates how to submit a small Python job to a PBS queue.

Link to section 'Example 1: Hello world' of 'Example Python Jobs' Example 1: Hello world

Prepare a Python input file with an appropriate filename, here named myjob.in:

# FILENAME:  hello.py

import string, sys
print "Hello, world!"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/bash
# FILENAME:  myjob.sub

module load anaconda

python hello.py

Submit the job

View job status

View results of the job

Hello, world!

Link to section 'Example 2: Matrix multiply' of 'Example Python Jobs' Example 2: Matrix multiply

Save the following script as matrix.py:

# Matrix multiplication program

x = [[3,1,4],[1,5,9],[2,6,5]]
y = [[3,5,8,9],[7,9,3,2],[3,8,4,6]]

result = [[sum(a*b for a,b in zip(x_row,y_col)) for y_col in zip(*y)] for x_row in x]

for r in result:
        print(r)

Change the last line in the job submission file above to read:

python matrix.py

The standard output file from this job will result in the following matrix:

[28, 56, 43, 53]
[65, 122, 59, 73]
[63, 104, 54, 60]

Link to section 'Example 3: Sine wave plot using numpy and matplotlib packages' of 'Example Python Jobs' Example 3: Sine wave plot using numpy and matplotlib packages

Save the following script as sine.py:

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pylab as plt

x = np.linspace(-np.pi, np.pi, 201)
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.savefig('sine.png')

Change your job submission file to submit this script and the job will output a png file and blank standard output and error files.

For more information about Python:

Managing Environments with Conda

Conda is a package manager in Anaconda that allows you to create and manage multiple environments where you can pick and choose which packages you want to use. To use Conda you must load an Anaconda module:

$ module load anaconda

Many packages are pre-installed in the global environment. To see these packages:

$ conda list

To create your own custom environment:

$ conda create --name MyEnvName python=3.8 FirstPackageName SecondPackageName -y

The --name option specifies that the environment created will be named MyEnvName. You can include as many packages as you require separated by a space. Including the -y option lets you skip the prompt to install the package. By default environments are created and stored in the $HOME/.conda directory.

To create an environment at a custom location:

$ conda create --prefix=$HOME/MyEnvName python=3.8 PackageName -y

To see a list of your environments:

$ conda env list

To remove unwanted environments:

$ conda remove --name MyEnvName --all

To add packages to your environment:

$ conda install --name MyEnvName PackageNames

To remove a package from an environment:

$ conda remove --name MyEnvName PackageName

Installing packages when creating your environment, instead of one at a time, will help you avoid dependency issues.

To activate or deactivate an environment you have created:

$ source activate MyEnvName
$ source deactivate MyEnvName

If you created your conda environment at a custom location using --prefix option, then you can activate or deactivate it using the full path.

$ source activate $HOME/MyEnvName
$ source deactivate $HOME/MyEnvName

To use a custom environment inside a job you must load the module and activate the environment inside your job submission script. Add the following lines to your submission script:

$ module load anaconda
$ source activate MyEnvName

For more information about Python:

Managing Packages with Pip

Pip is a Python package manager. Many Python package documentation provide pip instructions that result in permission errors because by default pip will install in a system-wide location and fail.


Exception:
Traceback (most recent call last):
... ... stack trace ... ...
OSError: [Errno 13] Permission denied: '/apps/cent7/anaconda/2020.07-py38/lib/python3.8/site-packages/mkl_random-1.1.1.dist-info'

If you encounter this error, it means that you cannot modify the global Python installation. We recommend installing Python packages in a conda environment. Detailed instructions for installing packages with pip can be found in our Python package installation page.

Below we list some other useful pip commands.

  • Search for a package in PyPI channels:
    $ pip search packageName
    
  • Check which packages are installed globally:
    $ pip list
    
  • Check which packages you have personally installed:
    $ pip list --user
    
  • Snapshot installed packages:
    $ pip freeze > requirements.txt
    
  • You can install packages from a snapshot inside a new conda environment. Make sure to load the appropriate conda environment first.
    $ pip install -r requirements.txt
    

For more information about Python:

Installing Packages

Installing Python packages in an Anaconda environment is recommended. One key advantage of Anaconda is that it allows users to install unrelated packages in separate self-contained environments. Individual packages can later be reinstalled or updated without impacting others. If you are unfamiliar with Conda environments, please check our Conda Guide.

To facilitate the process of creating and using Conda environments, we support a script (conda-env-mod) that generates a module file for an environment, as well as an optional Jupyter kernel to use this environment in a JupyterHub notebook.

You must load one of the anaconda modules in order to use this script.

$ module load anaconda

Step-by-step instructions for installing custom Python packages are presented below.

Link to section 'Step 1: Create a conda environment' of 'Installing Packages' Step 1: Create a conda environment

Users can use the conda-env-mod script to create an empty conda environment. This script needs either a name or a path for the desired environment. After the environment is created, it generates a module file for using it in future. Please note that conda-env-mod is different from the official conda-env script and supports a limited set of subcommands. Detailed instructions for using conda-env-mod can be found with the command conda-env-mod --help.

  • Example 1: Create a conda environment named mypackages in user's $HOME directory.

    $ conda-env-mod create -n mypackages
  • Example 2: Create a conda environment named mypackages at a custom location.

    $ conda-env-mod create -p /depot/mylab/apps/mypackages

    Please follow the on-screen instructions while the environment is being created. After finishing, the script will print the instructions to use this environment.

    
    ... ... ...
    Preparing transaction: ...working... done
    Verifying transaction: ...working... done
    Executing transaction: ...working... done
    +------------------------------------------------------+
    | To use this environment, load the following modules: |
    |       module load use.own                            |
    |       module load conda-env/mypackages-py3.8.5      |
    +------------------------------------------------------+
    Your environment "mypackages" was created successfully.
    

Note down the module names, as you will need to load these modules every time you want to use this environment. You may also want to add the module load lines in your jobscript, if it depends on custom Python packages.

By default, module files are generated in your $HOME/privatemodules directory. The location of module files can be customized by specifying the -m /path/to/modules option to conda-env-mod.

Note: The main differences between -p and -m are: 1) -p will change the location of packages to be installed for the env and the module file will still be located at the $HOME/privatemodules directory as defined in use.own. 2) -m will only change the location of the module file. So the method to load modules created with -m and -p are different, see Example 3 for details.

  • Example 3: Create a conda environment named labpackages in your group's Data Depot space and place the module file at a shared location for the group to use.
    $ conda-env-mod create -p /depot/mylab/apps/labpackages -m /depot/mylab/etc/modules
    ... ... ...
    Preparing transaction: ...working... done
    Verifying transaction: ...working... done
    Executing transaction: ...working... done
    +-------------------------------------------------------+
    | To use this environment, load the following modules:  |
    |       module use /depot/mylab/etc/modules             |
    |       module load conda-env/labpackages-py3.8.5      |
    +-------------------------------------------------------+
    Your environment "labpackages" was created successfully.
    

If you used a custom module file location, you need to run the module use command as printed by the command output above.

By default, only the environment and a module file are created (no Jupyter kernel). If you plan to use your environment in a JupyterHub notebook, you need to append a --jupyter flag to the above commands.

  • Example 4: Create a Jupyter-enabled conda environment named labpackages in your group's Data Depot space and place the module file at a shared location for the group to use.
    $ conda-env-mod create -p /depot/mylab/apps/labpackages -m /depot/mylab/etc/modules --jupyter
    ... ... ...
    Jupyter kernel created: "Python (My labpackages Kernel)"
    ... ... ...
    Your environment "labpackages" was created successfully.
    

Link to section 'Step 2: Load the conda environment' of 'Installing Packages' Step 2: Load the conda environment

  • The following instructions assume that you have used conda-env-mod script to create an environment named mypackages (Examples 1 or 2 above). If you used conda create instead, please use conda activate mypackages.

    $ module load use.own
    $ module load conda-env/mypackages-py3.8.5
    

    Note that the conda-env module name includes the Python version that it supports (Python 3.8.5 in this example). This is same as the Python version in the anaconda module.

  • If you used a custom module file location (Example 3 above), please use module use to load the conda-env module.

    $ module use /depot/mylab/etc/modules
    $ module load conda-env/labpackages-py3.8.5
    

Link to section 'Step 3: Install packages' of 'Installing Packages' Step 3: Install packages

Now you can install custom packages in the environment using either conda install or pip install.

Link to section 'Installing with conda' of 'Installing Packages' Installing with conda

  • Example 1: Install OpenCV (open-source computer vision library) using conda.

    $ conda install opencv
  • Example 2: Install a specific version of OpenCV using conda.

    $ conda install opencv=4.5.5
  • Example 3: Install OpenCV from a specific anaconda channel.

    $ conda install -c anaconda opencv

Link to section 'Installing with pip' of 'Installing Packages' Installing with pip

  • Example 4: Install pandas using pip.

    $ pip install pandas
  • Example 5: Install a specific version of pandas using pip.

    $ pip install pandas==1.4.3

    Follow the on-screen instructions while the packages are being installed. If installation is successful, please proceed to the next section to test the packages.

Note: Do NOT run Pip with the --user argument, as that will install packages in a different location and might mess up your account environment.

Link to section 'Step 4: Test the installed packages' of 'Installing Packages' Step 4: Test the installed packages

To use the installed Python packages, you must load the module for your conda environment. If you have not loaded the conda-env module, please do so following the instructions at the end of Step 1.

$ module load use.own
$ module load conda-env/mypackages-py3.8.5
  • Example 1: Test that OpenCV is available.
    $ python -c "import cv2; print(cv2.__version__)"
    
  • Example 2: Test that pandas is available.
    $ python -c "import pandas; print(pandas.__version__)"
    

If the commands finished without errors, then the installed packages can be used in your program.

Link to section 'Additional capabilities of conda-env-mod script' of 'Installing Packages' Additional capabilities of conda-env-mod script

The conda-env-mod tool is intended to facilitate creation of a minimal Anaconda environment, matching module file and optionally a Jupyter kernel. Once created, the environment can then be accessed via familiar module load command, tuned and expanded as necessary. Additionally, the script provides several auxiliary functions to help manage environments, module files and Jupyter kernels.

General usage for the tool adheres to the following pattern:

$ conda-env-mod help
$ conda-env-mod <subcommand> <required argument> [optional arguments]

where required arguments are one of

  • -n|--name ENV_NAME (name of the environment)
  • -p|--prefix ENV_PATH (location of the environment)

and optional arguments further modify behavior for specific actions (e.g. -m to specify alternative location for generated module files).

Given a required name or prefix for an environment, the conda-env-mod script supports the following subcommands:

  • create - to create a new environment, its corresponding module file and optional Jupyter kernel.
  • delete - to delete existing environment along with its module file and Jupyter kernel.
  • module - to generate just the module file for a given existing environment.
  • kernel - to generate just the Jupyter kernel for a given existing environment (note that the environment has to be created with a --jupyter option).
  • help - to display script usage help.

Using these subcommands, you can iteratively fine-tune your environments, module files and Jupyter kernels, as well as delete and re-create them with ease. Below we cover several commonly occurring scenarios.

Note: When you try to use conda-env-mod delete, remember to include the arguments as you create the environment (i.e. -p package_location and/or -m module_location).

Link to section 'Generating module file for an existing environment' of 'Installing Packages' Generating module file for an existing environment

If you already have an existing configured Anaconda environment and want to generate a module file for it, follow appropriate examples from Step 1 above, but use the module subcommand instead of the create one. E.g.

$ conda-env-mod module -n mypackages

and follow printed instructions on how to load this module. With an optional --jupyter flag, a Jupyter kernel will also be generated.

Note that the module name mypackages should be exactly the same with the older conda environment name. Note also that if you intend to proceed with a Jupyter kernel generation (via the --jupyter flag or a kernel subcommand later), you will have to ensure that your environment has ipython and ipykernel packages installed into it. To avoid this and other related complications, we highly recommend making a fresh environment using a suitable conda-env-mod create .... --jupyter command instead.

Link to section 'Generating Jupyter kernel for an existing environment' of 'Installing Packages' Generating Jupyter kernel for an existing environment

If you already have an existing configured Anaconda environment and want to generate a Jupyter kernel file for it, you can use the kernel subcommand. E.g.

$ conda-env-mod kernel -n mypackages

This will add a "Python (My mypackages Kernel)" item to the dropdown list of available kernels upon your next login to the JupyterHub.

Note that generated Jupiter kernels are always personal (i.e. each user has to make their own, even for shared environments). Note also that you (or the creator of the shared environment) will have to ensure that your environment has ipython and ipykernel packages installed into it.

Link to section 'Managing and using shared Python environments' of 'Installing Packages' Managing and using shared Python environments

Here is a suggested workflow for a common group-shared Anaconda environment with Jupyter capabilities:

The PI or lab software manager:

  • Creates the environment and module file (once):

    $ module purge
    $ module load anaconda
    $ conda-env-mod create -p /depot/mylab/apps/labpackages -m /depot/mylab/etc/modules --jupyter
    
  • Installs required Python packages into the environment (as many times as needed):

    $ module use /depot/mylab/etc/modules
    $ module load conda-env/labpackages-py3.8.5
    $ conda install  .......                       # all the necessary packages
    

Lab members:

  • Lab members can start using the environment in their command line scripts or batch jobs simply by loading the corresponding module:

    $ module use /depot/mylab/etc/modules
    $ module load conda-env/labpackages-py3.8.5
    $ python my_data_processing_script.py .....
    
  • To use the environment in Jupyter notebooks, each lab member will need to create his/her own Jupyter kernel (once). This is because Jupyter kernels are private to individuals, even for shared environments.

    $ module use /depot/mylab/etc/modules
    $ module load conda-env/labpackages-py3.8.5
    $ conda-env-mod kernel -p /depot/mylab/apps/labpackages
    

A similar process can be devised for instructor-provided or individually-managed class software, etc.

Link to section 'Troubleshooting' of 'Installing Packages' Troubleshooting

  • Python packages often fail to install or run due to dependency incompatibility with other packages. More specifically, if you previously installed packages in your home directory it is safer to clean those installations.
    $ mv ~/.local ~/.local.bak
    $ mv ~/.cache ~/.cache.bak
    
  • Unload all the modules.
    $ module purge
    
  • Clean up PYTHONPATH.
    $ unset PYTHONPATH
    
  • Next load the modules (e.g. anaconda) that you need.
    $ module load anaconda/2020.11-py38
    $ module load use.own
    $ module load conda-env/mypackages-py3.8.5
    
  • Now try running your code again.
  • Few applications only run on specific versions of Python (e.g. Python 3.6). Please check the documentation of your application if that is the case.

Installing Packages from Source

We maintain several Anaconda installations. Anaconda maintains numerous popular scientific Python libraries in a single installation. If you need a Python library not included with normal Python we recommend first checking Anaconda. For a list of modules currently installed in the Anaconda Python distribution:

$ module load anaconda
$ conda list
# packages in environment at /apps/spack/bell/apps/anaconda/2020.02-py37-gcc-4.8.5-u747gsx:
#
# Name                    Version                   Build  Channel
_ipyw_jlab_nb_ext_conf    0.1.0                    py37_0  
_libgcc_mutex             0.1                        main  
alabaster                 0.7.12                   py37_0  
anaconda                  2020.02                  py37_0  
...

If you see the library in the list, you can simply import it into your Python code after loading the Anaconda module.

If you do not find the package you need, you should be able to install the library in your own Anaconda customization. First try to install it with Conda or Pip. If the package is not available from either Conda or Pip, you may be able to install it from source.

Use the following instructions as a guideline for installing packages from source. Make sure you have a download link to the software (usually it will be a tar.gz archive file). You will substitute it on the wget line below.

We also assume that you have already created an empty conda environment as described in our Python package installation guide.

$ mkdir ~/src
$ cd ~/src
$ wget http://path/to/source/tarball/app-1.0.tar.gz
$ tar xzvf app-1.0.tar.gz
$ cd app-1.0
$ module load anaconda
$ module load use.own
$ module load conda-env/mypackages-py3.8.5
$ python setup.py install
$ cd ~
$ python
>>> import app
>>> quit()

The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

If you need further help or run into any issues installing a library, contact us or drop by Coffee Hour for in-person help.

For more information about Python:

Example: Create and Use Biopython Environment with Conda

Link to section 'Using conda to create an environment that uses the biopython package' of 'Example: Create and Use Biopython Environment with Conda' Using conda to create an environment that uses the biopython package

To use Conda you must first load the anaconda module:

module load anaconda

Create an empty conda environment to install biopython:

conda-env-mod create -n biopython

Now activate the biopython environment:

module load use.own
module load conda-env/biopython-py3.8.5

Install the biopython packages in your environment:

conda install --channel anaconda biopython -y
Fetching package metadata ..........
Solving package specifications .........
.......
Linking packages ...
[    COMPLETE    ]|################################################################

The --channel option specifies that it searches the anaconda channel for the biopython package. The -y argument is optional and allows you to skip the installation prompt. A list of packages will be displayed as they are installed.

Remember to add the following lines to your job submission script to use the custom environment in your jobs:

module load anaconda
module load use.own
module load conda-env/biopython-py3.8.5

If you need further help or run into any issues with creating environments, contact us or drop by Coffee Hour for in-person help.

For more information about Python:

Numpy Parallel Behavior

The widely available Numpy package is the best way to handle numerical computation in Python. The numpy package provided by our anaconda modules is optimized using Intel's MKL library. It will automatically parallelize many operations to make use of all the cores available on a machine.

In many contexts that would be the ideal behavior. On the cluster however that very likely is not in fact the preferred behavior because often more than one user is present on the system and/or more than one job on a node. Having multiple processes contend for those resources will actually result in lesser performance.

Setting the MKL_NUM_THREADS or OMP_NUM_THREADS environment variable(s) allows you to control this behavior. Our anaconda modules automatically set these variables to 1 if and only if you do not currently have that variable defined.

When submitting batch jobs it is always a good idea to be explicit rather than implicit. If you are submitting a job that you want to make use of the full resources available on the node, set one or both of these variables to the number of cores you want to allow numpy to make use of.

#!/bin/bash


module load anaconda
export MKL_NUM_THREADS=20

...

If you are submitting multiple jobs that you intend to be scheduled together on the same node, it is probably best to restrict numpy to a single core.

#!/bin/bash


module load anaconda
export MKL_NUM_THREADS=1
Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.