ML-Toolkit

ITaP maintains a set of popular machine learning (ML) applications on Hammer. These are Anaconda/Python based distribution of the respective applications. Currently, applications are supported for two major Python versions (2.7 and 3.6). Detailed instructions for searching and using the installed ML applications are presented below.

Important: You must load one of the learning modules described below before loading the ML applications.

Link to section 'Instructions for using ML packages' of 'ML-Toolkit' Instructions for using ML packages

Link to section 'Prerequisite' of 'ML-Toolkit' Prerequisite

Make sure your Python environment is clean. Python is very sensitive about packages installed in your local pip folder or in your Conda environments. It is always safer to start with a clean environment. The steps below archive all your existing python packages to backup directories reducing chances of conflict.

$ mv ~/.conda ~/.conda.bak
$ mv ~/.local ~/.local.bak
$ mv ~/.cache ~/.cache.bak

Link to section 'Find installed ML applications' of 'ML-Toolkit' Find installed ML applications

To search or load a machine learning application, you must first load one of the learning modules. The learning module loads the prerequisites (such as anaconda ) and makes ML applications visible to the user.

Step 1. Find and load a preferred learning module.

There are two learning modules available on hammer, each corresponding to a specific Python version. In the example below, we want to use the learning module for Python 3.6.

$ module spider learning

----------------------------------------------------------------------------
  learning:
----------------------------------------------------------------------------
     Versions:
        learning/conda-5.1.0-py27-cpu
        learning/conda-5.1.0-py36-cpu

.........
$ module load learning/conda-5.1.0-py36-cpu

Step 2. Find a machine learning application.

You can now use the module spider command to find installed applications. The following example searches for available PyTorch installations.

$ module spider pytorch

---------------------------------------------------------------------------------
  ml-toolkit-cpu/pytorch: ml-toolkit-cpu/pytorch/0.4.0
---------------------------------------------------------------------------------

    This module can be loaded directly: module load ml-toolkit-cpu/pytorch/0.4.0 

Step 3. List all machine learning applications.

Note that the ML packages are installed under the common application name ml-toolkit-cpu. To list all machine learning packages installed on hammer, run the command:

$ module spider ml-toolkit-cpu

Currently, ml-toolkit-cpu includes 9 popular ML packages listed below.

ml-toolkit-cpu/caffe/1.0.0
ml-toolkit-cpu/cntk/2.3
ml-toolkit-cpu/gym/0.10.5
ml-toolkit-cpu/keras/2.1.5
ml-toolkit-cpu/opencv/3.4.1
ml-toolkit-cpu/pytorch/0.4.0
ml-toolkit-cpu/tensorflow/1.4.0
ml-toolkit-cpu/tflearn/0.3.2
ml-toolkit-cpu/theano/1.0.2

Link to section 'Load and use the ML applications' of 'ML-Toolkit' Load and use the ML applications

Step 4. After loading a preferred learning module in Step 1, you can now load the desired ML applications in your environment. In the following example, we load the OpenCV and PyTorch modules.

$ module load ml-toolkit-cpu/opencv/3.4.1
$ module load ml-toolkit-cpu/pytorch/0.4.0

Step 5. You can list which ML applications are loaded in your environment using the command

$ module list

Link to section 'Verify application import' of 'ML-Toolkit' Verify application import

Step 6. The next step is to check that you can actually use the desired ML application. You can do this by running the import command in Python.

$ python -c "import torch; print(torch.__version__)"

If the import operation succeeded, then you can run your own ML codes. Few ML applications (such as tensorflow) print diagnostic warnings while loading--this is the expected behavior.

If the import failed with an error, please see the troubleshooting information below.

Step 7. To load a different set of applications, unload the previously loaded applications and load the new applications. The example below loads Tensorflow and Keras instead of PyTorch and OpenCV.

$ module unload ml-toolkit-cpu/opencv/3.4.1
$ module unload ml-toolkit-cpu/pytorch/0.4.0
$ module load ml-toolkit-cpu/tensorflow/1.4.0
$ module load ml-toolkit-cpu/keras/2.1.5

Link to section 'Troubleshooting' of 'ML-Toolkit' Troubleshooting

ML applications depend on a wide range of Python packages and mixing multiple versions of these packages can lead to error. The following guidelines will assist you in identifying the cause of the problem.

  • Check that you are using the correct version of Python with the command python --version. This should match the Python version in the loaded anaconda module.
  • Make sure that your Python environment is clean. Follow the instructions in "Prerequisites" section above.
  • Start from a clean environment. Either start a new terminal session or unload all the modules: module purge. Then load the desired modules following Steps 1-4.
  • Verify that PYTHONPATH does not point to undesired packages. Run the following command to print PYTHONPATH: echo $PYTHONPATH
  • Note that Caffe has a conflicting version of PyQt5. So, if you want to use Spyder (or any GUI application that uses PyQt), then you should unload the caffe module.
  • Use Google search to your advantage. Copy the error message in Google and check probable causes.

More examples showing how to use ml-toolkit modules in a batch job are presented in this guide.

Link to section 'Installing ML applications' of 'ML-Toolkit' Installing ML applications

If the ML application you are trying to use is not in the list of supported applications or if you need a newer version of an installed application, you can install it in your home directory. We recommend using anaconda environments to install and manage Python packages. Please follow the steps carefully, otherwise you may end up with a faulty installation. The example below shows how to install PyTorch. 0.4.1 (a newer version) in your home directory.

Step 1: Unload all modules and start with a clean environment.

$ module purge

Step 2: Load the anaconda module with desired Python version.

$ module load anaconda/5.1.0-py36

Step 3: Create a custom anaconda environment. Make sure the python version matches the Python version in the anaconda module.

$ conda-env-mod create -n env_name_here

Step 4: Activate the anaconda environment by loading the modules displayed at the end of step 3.

$ module load use.own
$ module load conda-env/env_name_here-py3.6.4

Step 5: Now install the desired ML application. You can install multiple Python packages at this step using either conda or pip.

$ conda install -c pytorch pytorch-cpu=0.4.1

If the installation succeeded, you can now use the installed application.

Note that loading the modules generated by conda-env-mod has different behavior than conda create env_name_here followed by source activate env_name_here. After running source activate, you may not be able to access any Python packages in anaconda or ml-toolkit modules. Therefore, using conda-env-mod is the preferred way of using your custom installations.

Link to section 'Troubleshooting' of 'ML-Toolkit' Troubleshooting

In most situations, dependencies among Python modules lead to error. If you cannot use a Python package after installing it, please follow the steps below to find a workaround.

  • Unload all the modules.
    $ module purge
    
  • Clean up PYTHONPATH.
    $ unset PYTHONPATH
  • Next load the modules, e.g., anaconda and your custom environment.
    $ module load anaconda/5.1.0-py36
    $ module load use.own
    $ module load conda-env/env_name_here-py3.6.4
    
  • Now try running your code again.
  • Few applications only run on specific versions of Python (e.g. Python 3.6). Please check the documentation of your application if that is the case.
  • If you have installed a newer version of an ml-toolkit package (e.g., a newer version of PyTorch or Tensorflow), make sure that the ml-toolkit modules are NOT loaded. In general, we recommend that you don't mix ml-toolkit modules with your custom installations.
Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.