Radon User Guide

Collapse Topics

    Overview of Radon
        Overview of Radon

    Accounts
        Logging In
            Passwords
            SSH Client Software
            SSH Keys
            ThinLinc
            SSH X11 Forwarding

        Passwords

    File Storage and Transfer
        Archive and Compression
        Environment Variables
        Storage Options
            Home Directory
            Long-Term Storage
            Scratch Space
            /tmp Directory

        Storage Quota / Limits
        File Transfer
            SCP
            Globus
            Windows Network Drive / SMB
            FTP / SFTP


    Applications
        Environment Management with the Module Command

    Compiling Source Code
        Compiling Serial Programs
        Compiling MPI Programs
        Compiling OpenMP Programs
        Compiling Hybrid Programs
        Intel MKL Library
        Provided Compilers
            GNU Compilers
            Intel Compilers
            PGI Compilers


    Running Jobs
        Basics of PBS Jobs
            Job Submission Script
            Submitting a Job
            Checking Job Status
            Checking Job Output
            Holding a Job
            Job Dependencies
            Canceling a Job

        Example Jobs
            Generic PBS Jobs
                Batch
                Multiple Node
                Interactive Jobs
                Serial Jobs
                MPI
                Hybrid

            Specific Applications
                Gaussian
                Maple
                Mathematica
                Matlab
                    Matlab Script (.m File)
                    Implicit Parallelism
                    Profile Manager
                    Parallel Computing Toolbox (parfor)
                    Distributed Computing Server (parallel job)

                Octave
                Perl
                Python
                R
                SAS
                Spark
                    Spark




    Common error messages
        bash: command not found
        qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu
        /usr/bin/xauth: error in locking authority file

    Common Questions
        What is the "debug" queue?
        How can my collaborators outside Purdue get access to Radon?
        Do I need to do anything to my firewall to access Radon?


Overview of Radon Overview of Radon

Overview of Radon

Radon is a compute cluster operated by ITaP for general campus use. Radon consists of 45 HP Moonshot compute nodes with 32 GB RAM and are connected by 10 Gigabit Ethernet (10GigE).

Radon Detailed Hardware Specification

Radon consists of one sub-cluster "E". The nodes have 2.5 GHz quad-core, Hyper-Threading enabled Intel Xeon E3-1284 CPUs (8 logical cores), 32 GB RAM, and 10 Gigabit Ethernet.

Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Nod Interconnect TeraFLOPS
Radon-E 45 One Hyper-Threaded Quad-Core Xeon E3-1284L 8 (Logical) 32 GB 10 GigE N/A

Radon nodes run Red Hat Enterprise Linux 6 (RHEL6) and use Moab Workload Manager 8 and TORQUE Resource Manager 5 as the portable batch system (PBS) for resource and job management. The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).

On Radon, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • Intel 13.1.1.163
  • MKL
  • Intel MPI

To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

Accounts

Accounts on Radon

Obtaining an Account

All Purdue faculty, staff, and students with the approval of their advisor may request access to Radon. Refer to the Accounts / Access page for more details on how to request access.

Outside Collaborators

Your Departmental Business Office can submit a Request for Privileges (R4P) to provide access to collaborators outside Purdue, including recent graduates. Instructions are at http://www.purdue.edu/hr/pdf/r4pRequestorInstructions.pdf and the Request form is at https://www.purdue.edu/apps/account/r4p

More Accounts Information

    Logging In
        Passwords
        SSH Client Software
        SSH Keys
        ThinLinc
        SSH X11 Forwarding

    Passwords

Accounts Logging In

Logging In

To submit jobs on Radon, log in to the submission host radon.rcac.purdue.edu via SSH. This submission host is actually 2 front-end hosts: radon-fe00 and radon-fe01. The login process randomly assigns one of these front-ends to each login to radon.rcac.purdue.edu. While all of these front-end hosts are identical, each has its own /tmp. Sharing data in /tmp during subsequent sessions may fail. ITaP advises using scratch space for multisession, shared data instead.

    Passwords
    SSH Client Software
    SSH Keys
    ThinLinc
    SSH X11 Forwarding

Accounts Logging In Passwords

Passwords

If you have received a default password as part of the process of obtaining your account, you should change it before you log onto Rice for the first time. Change your password from the SecurePurdue website. You will have the same password on all ITaP systems such as Rice, Purdue email, or Blackboard.

Passwords may need to be changed periodically in accordance with Purdue security policies. Passwords must follow certain guidelines as described on the SecurePurdue webpage and ITaP recommends following some guidelines to select a strong password.

ITaP staff will NEVER ask for your password, by email or otherwise.

Never share your password with another user or make your password known to anyone else.

Accounts Logging In SSH Client Software

Secure Shell or SSH is a way of establishing a secure connection between two computers. It uses public-key cryptography to authenticate the user with the remote computer and to establish a secure connection. Its usual function involves logging in to a remote machine and executing commands. There are many SSH clients available for all operating systems:

Linux / Solaris / AIX / HP-UX / Unix:

  • The sshcommand is pre-installed. Log in using ssh myusername@radon.rcac.purdue.edu from a terminal.

Microsoft Windows:

  • MobaXterm is a small, easy to use, full-featured SSH client. It includes X11 support for remote displays, SFTP capabilities, and limited SSH authentication forwarding for keys.
  • PuTTY is an extremely small download of a free, full-featured SSH client.
  • Pageant is an extremely small program used for SSH authentication.

Mac OS X:

  • The ssh command is pre-installed. You may start a local terminal window from "Applications->Utilities". Log in by typing the command ssh myusername@radon.rcac.purdue.edu.

Accounts Logging In SSH Keys

SSH Keys

SSH works with many different means of authentication. One popular authentication method is Public Key Authentication (PKA). PKA is a method of establishing your identity to a remote computer using related sets of encryption data called keys. PKA is a more secure alternative to traditional password-based authentication with which you are probably familiar.

To employ PKA via SSH, you manually generate a keypair (also called SSH keys) in the location from where you wish to initiate a connection to a remote machine. This keypair consists of two text files: private key and public key. You keep the private key file confidential on your local machine or local home directory (hence the name "private" key). You then log in to a remote machine (if possible) and append the corresponding public key text to the end of a specific file, or have a system administrator do so on your behalf. In future login attempts, PKA compares the public and private keys to verify your identity; only then do you have access to the remote machine.

As a user, you can create, maintain, and employ as many keypairs as you wish. If you connect to a computational resource from your work laptop, your work desktop, and your home desktop, you can create and employ keypairs on each. You can also create multiple keypairs on a single local machine to serve different purposes, such as establishing access to different remote machines or establishing different types of access to a single remote machine. In short, PKA via SSH offers a secure but flexible means of identifying yourself to all kinds of computational resources.

Passphrases and SSH Keys

Creating a keypair prompts you to provide a passphrase for the private key. This passphrase is different from a password in a number of ways. First, a passphrase is, as the name implies, a phrase. It can include most types of characters, including spaces, and has no limits on length. Secondly, the remote machine does not receive this passphrase for verification. Its purpose is only to allow the use of your local private key and is specific to a specific local private key.

Perhaps you are wondering why you would need a private key passphrase at all when using PKA. If the private key remains secure, why the need for a passphrase just to use it? Indeed, if the location of your private keys were always completely secure, a passphrase might not be necessary. In reality, a number of situations could arise in which someone may improperly gain access to your private key files. In these situations, a passphrase offers another level of security for you, the user who created the keypair.

Think of the private key/passphrase combination as being analogous to your ATM card/PIN combination. The ATM card itself is the object that grants access to your important accounts, and as such, should remain secure at all times—just as a private key should. But if you ever lose your wallet or someone steals your ATM card, you are glad that your PIN exists to offer another level of protection. The same is true for a private key passphrase.

When you create a keypair, you should always provide a corresponding private key passphrase. For security purposes, avoid using phrases which automated programs can discover (e.g. phrases that consist solely of words in English-language dictionaries). This passphrase is not recoverable if forgotten, so make note of it. Only a few situations warrant using a non-passphrase-protected private key—conducting automated file backups is one such situation.

Accounts Logging In ThinLinc

ThinLinc

ITaP Research Computing provides ThinLinc as an alternative to running an X11 server directly on your computer. It allows you to run graphical applications or graphical interacitve jobs directly on Radon through a persisent remote graphical desktop session.

ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. This service works very well over a high latency, low bandwidth, or off-campus connection compared to running an X11 server locally. It is also very helpful for Windows users who do not have an easy to use local X11 server, as little to no set up is required on your computer.

There are two ways in which to use ThinLinc: preferably through the native client or through a web browser.

Installing the ThinLinc native client

The native ThinLinc client will offer the best experience especially over off-campus connections and is the recommended method for using ThinLinc. It is compatible with Windows, Mac OS X, and Linux.

  • Download the ThinLinc client from the ThinLinc website.
  • Start the ThinLinc client on your computer.
  • In the client's login window, use thinlinc.rcac.purdue.edu as the Server. Use your Purdue Career Account username and password.
  • Click the Connect button.
  • Continue to following section on connecting to Radon from ThinLinc.

Using ThinLinc through your web browser

The ThinLinc service can be accessed from your web browser as a convenience to installing the native client. This option works with no set up and is a good option for those on computers where you do not have privileges to install software. All that is required is an up-to-date web browser. Older versions of Internet Explorer may not work.

  • Open a web browser and navigate to thinlinc.rcac.purdue.edu.
  • Log in with your Purdue Career Account username and password.
  • You may safely proceed past any warning messages from your browser.
  • Continue to the following section on connecting to Radon from ThinLinc.

Connecting to Radon from ThinLinc

  • Once logged in, you will be presented with a remote Linux desktop.
  • Open the terminal application on the remote desktop.
  • Log in to the submission host radon.rcac.purdue.edu with X forwarding enabled using the following command:
    $ ssh -Y radon.rcac.purdue.edu 
  • Once logged in to the Radon head node, you may use graphical editors, debuggers, software like Matlab, or run graphical interactive jobs. For example, to test the X forwarding connection issue the following command to launch the graphical editor gedit:
    $ gedit
  • This session will remain persistent even if you disconnect from the session. Any interactive jobs or applications you left running will continue running even if you are not connected to the session.

Tips for using ThinLinc native client

  • To exit a full screen ThinLinc session press the F8 key on your keyboard (fn + F8 key for Mac users) and click to disconnect or exit full screen.
  • Full screen mode can be disabled when connecting to a session by clicking the Options button and disabling full screen mode from the Screen tab.

Accounts Logging In SSH X11 Forwarding

SSH X11 Forwarding

SSH supports tunneling of X11 (X-Windows). If you have an X11 server running on your local machine, you may use X11 applications on remote systems and have their graphical displays appear on your local machine. These X11 connections are tunneled and encrypted automatically by your SSH client.

Installing an X11 Server

To use X11, you will need to have a local X11 server running on your personal machine. Both free and commercial X11 servers are available for various operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • An X11 server is at the core of all graphical sessions. If you are logged in to a graphical environment on these operating systems, you are already running an X11 server.
  • ThinLinc is an alternative to running an X11 server directly on your Linux computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.

Microsoft Windows:

  • ThinLinc is an alternative to running an X11 server directly on your Windows computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.
  • MobaXterm is a small, easy to use, full-featured SSH client. It includes X11 support for remote displays, SFTP capabilities, and limited SSH authentication forwarding for keys.
  • Xming is a free X11 server available for all versions of Windows, although it may occasionally hang and require a restart. Download the "Public Domain Xming" or donate to the development for the newest version.
  • Hummingbird eXceed is a commercial X11 server available for all versions of Windows.
  • Cygwin is another free X11 server available for all versions of Windows. Download and run setup.exe. During installation, you must select the following packages which are not included by default: X-startup-scripts XFree86-lib-compat * xorg- xterm xwinwm lib-glitz-glx1 opengl (if you also want OpenGL support, under the Graphics group)
  • Once you are running the Cygwin X server, start an xterm, type XWin -multiwindow in it, and then press enter. You may now run your SSH client.

Mac OS X:

  • X11 is available as an optional install on the Mac OS X install disks prior to 10.7/Lion. Run the installer, select the X11 option, and follow the instructions. For 10.7+ please download XQuartz.
  • ThinLinc is an alternative to running an X11 server directly on your Mac computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.

Enabling X11 Forwarding in your SSH Client

Once you are running an X11 server, you will need to enable X11 forwarding/tunneling in your SSH client:

  • "ssh": X11 tunneling should be enabled by default. To be certain it is enabled, you may use ssh -Y.
  • PuTTY: Prior to connection, in your connection's options, under "X11", check "Enable X11 forwarding", and save your connection.
  • MobaXterm: Select "New session" and "SSH." Under "Advanced SSH Settings" check the box for X11 Forwarding.

SSH will set the remote environment variable $DISPLAY to "localhost:XX.YY" when this is working correctly. If you had previously set your $DISPLAY environment variable to your local IP or hostname, you must remove any set/export/setenv of this variable from your login scripts. The environment variable $DISPLAY must be left as SSH sets it, which is to a random local port address. Setting $DISPLAY to an IP or hostname will not work.

Accounts Passwords

Passwords

If you have received a default password as part of the process of obtaining your account, you should change it before you log onto Rice for the first time. Change your password from the SecurePurdue website. You will have the same password on all ITaP systems such as Rice, Purdue email, or Blackboard.

Passwords may need to be changed periodically in accordance with Purdue security policies. Passwords must follow certain guidelines as described on the SecurePurdue webpage and ITaP recommends following some guidelines to select a strong password.

ITaP staff will NEVER ask for your password, by email or otherwise.

Never share your password with another user or make your password known to anyone else.

File Storage and Transfer

File Storage and Transfer for Radon

    Archive and Compression
    Environment Variables
    Storage Options
        Home Directory
        Long-Term Storage
        Scratch Space
        /tmp Directory

    Storage Quota / Limits
    File Transfer
        SCP
        Globus
        Windows Network Drive / SMB
        FTP / SFTP


File Storage and Transfer Archive and Compression

Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. The mostly commonly used options are:

tar

  (more information)

Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.

Examples:

  (list contents of archive somefile.tar)
$ tar tvf somefile.tar

  (extract contents of somefile.tar)
$ tar xvf somefile.tar

  (extract contents of gzipped archive somefile.tar.gz)
$ tar xzvf somefile.tar.gz

  (extract contents of bzip2 archive somefile.tar.bz2)
$ tar xjvf somefile.tar.bz2

  (archive all ".c" files in current directory into one archive file)
$ tar cvf somefile.tar *.c

  (archive and gzip-compress all files in a directory into one archive file)
$ tar czvf somefile.tar.gz somedirectory/

  (archive and bzip2-compress all files in a directory into one archive file)
$ tar cjvf somefile.tar.bz2 somedirectory/

Other arguments for tar can be explored by using the man tar command.

gzip

  (more information)

The standard compression system for all GNU software.

Examples:

  (compress file somefile - also removes uncompressed file)
$ gzip somefile

  (uncompress file somefile.gz - also removes compressed file)
$ gunzip somefile.gz

bzip2

  (more information)

Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.

Examples:

  (compress file somefile - also removes uncompressed file)
$ bzip2 somefile

  (uncompress file somefile.bz2 - also removes compressed file)
$ bunzip2 somefile.bz2

There are several other, less commonly used, options available as well:

  • zip
  • 7zip
  • xz

File Storage and Transfer Environment Variables

Environment Variables

Several environment variables are automatically defined for you to help you manage your storage. Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:

Name Description
HOME path to your home directory
PWD path to your current directory
RCAC_SCRATCH path to scratch filesystem

By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

$ ls $HOME ...

$ ls $RCAC_SCRATCH/myproject ...

To find the value of any environment variable:

$ echo $RCAC_SCRATCH
/scratch/radon/m/myusername

To list the values of all environment variables:

$ env USER=myusername HOME=/home/myusername RCAC_SCRATCH=/scratch/radon/m/myusername ...

You may create or overwrite an environment variable. To pass (export) the value of a variable in bash:

$ export MYPROJECT=$RCAC_SCRATCH/myproject

To assign a value to an environment variable in either tcsh or csh:

$ setenv MYPROJECT value

File Storage and Transfer Storage Options

Storage Options

File storage options on ITaP research systems include long-term storage (home directories, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. ITaP provides daily snapshots of home directories for a limited time for accidental deletion recovery. ITaP does not back up scratch directories or temporary storage and regularly purges old files from scratch and /tmp directories. More details about each storage option appear below.

    Home Directory
    Long-Term Storage
    Scratch Space
    /tmp Directory

File Storage and Transfer Storage Options Home Directory

Home Directory

ITaP provides home directories for long-term file storage. Each user has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

ITaP provides daily snapshots of your home directory for a limited period of time in the event of accidental deletion. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive.

Your home directory physically resides within the Isilon storage system at Purdue. To find the path to your home directory, first log in then immediately enter the following:

$ pwd
/home/myusername

Or from any subdirectory:

$ echo $HOME
/home/myusername

Your home directory and its contents are available on all ITaP research computing machines, including front-end hosts and compute nodes.

Your home directory has a quota limiting the total size of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

Lost File Recovery

Only files which have been snap-shotted overnight are recoverable. If you lose a file the same day you created it, it is NOT recoverable.

To recover files lost from your home directory, use the flost command:

$ flost

File Storage and Transfer Storage Options Long-Term Storage

Long-Term Storage

Long-term Storage or Permanent Storage is available to ITaP research users on the High Performance Storage System (HPSS), an archival storage system, called Fortress. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has over 10PB of capacity.

For more information about Fortress, how it works, and user guides, and how to obtain an account:

File Storage and Transfer Storage Options Scratch Space

Scratch Space

ITaP provides scratch directories for short-term file storage only. The quota of your scratch directory is much greater than the quota of your home directory. You should use your scratch directory for storing temporary input files which your job reads or for writing temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results. The hsi and htar commands provide easy-to-use interfaces into the archive and can be used to copy files into the archive interactively or even automatically at the end of your regular job submission scripts.

Files in scratch directories are not recoverable. ITaP does not back up files in scratch directories. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

ITaP purges files from scratch directories not accessed or had content modified in 90 days. Owners of these files receive a notice one week before removal via email. Be sure to regularly check your Purdue email account or set up mail forwarding to an email account you do regularly check. For more information, please refer to our Scratch File Purging Policy.

All users may access scratch directories on Radon. To find the path to your scratch directory:

$ findscratch
/scratch/radon/m/myusername

The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

$ echo $RCAC_SCRATCH
/scratch/radon/m/myusername

All scratch directories are available on each front-end of all computational resources, however, only the /scratch/radon directory is available on Radon compute nodes. No other scratch directories are available on Radon compute nodes.

Your scratch directory has a quota capping the total size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits .

File Storage and Transfer Storage Options /tmp Directory

/tmp Directory

ITaP provides /tmp directories for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

ITaP does not perform backups for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

File Storage and Transfer Storage Quota / Limits

Storage Quota / Limits

ITaP imposes some limits on your disk usage on research systems. ITaP implements a quota on each filesystem. Each filesystem (home directory, scratch directory, etc.) may have a different limit. If you exceed the quota, you will not be able to save new files or new data to the filesystem until you delete or move data to long-term storage.

Checking Quota

To check the current quotas of your home and scratch directories check the My Quota page or use the myquota command:

$ myquota
Type        Filesystem          Size    Limit  Use         Files    Limit  Use
==============================================================================
home        extensible         5.0GB   10.0GB  50%             -        -   -
scratch     /scratch/radon/    8KB  476.8GB   0%             2  100,000   0%

The columns are as follows:

  • Type: indicates home or scratch directory.
  • Filesystem: name of storage option.
  • Size: sum of file sizes in bytes.
  • Limit: allowed maximum on sum of file sizes in bytes.
  • Use: percentage of file-size limit currently in use.
  • Files: number of files and directories (not the size).
  • Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
  • Use: percentage of file-number limit currently in use.

If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

$ du -h --max-depth=1 $HOME >myfile
32K /home/myusername/mysubdirectory_1
529M    /home/myusername/mysubdirectory_2
608K    /home/myusername/mysubdirectory_3

The second directory is the largest of the three, so apply command du to it.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

$ du -h --max-depth=1 $RCAC_SCRATCH >myfile
160K    /scratch/radon/m/myusername

This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to long-term storage to free space in your home and scratch directories.

Increasing Quota

Home Directory

If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. Unfortunately, it is not possible to increase your home directory quota beyond it's current level.

Scratch Space

If you find you need additional disk space in your scratch space, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may ask for a quota increase at rcac-help@purdue.edu.

File Storage and Transfer File Transfer

File Transfer

Radon supports several methods for file transfer. Use the links below to learn more about these methods.

    SCP
    Globus
    Windows Network Drive / SMB
    FTP / SFTP

File Storage and Transfer File Transfer SCP

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage:

  (to a remote system from local)
$ scp sourcefilename myusername@radon.rcac.purdue.edu:somedirectory/destinationfilename

  (from a remote system to local)
$ scp myusername@radon.rcac.purdue.edu:somedirectory/sourcefilename destinationfilename

  (recursive directory copy to a remote system from local)
$ scp -r sourcedirectory/ myusername@radon.rcac.purdue.edu:somedirectory/

Linux / Solaris / AIX / HP-UX / Unix:

  • You should have already installed the "scp" command-line program.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SCP and SFTP client.
  • PuTTY also offers "pscp.exe", which is an extremely small program and a basic SCP client.
  • Secure FX is a commercial SCP and SFTP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".

File Storage and Transfer File Transfer Globus

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It works within ITaP's various research storage systems; it connects between ITaP and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and Fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Globus Web:

  • Navigate to http://transfer.rcac.purdue.edu
  • Click "Proceed" to log in with your Purdue Career Account.
  • On your first login it will ask to make a connection to a Globus account. If you already have one - sign in to associate with your Career Account. Otherwise, click the link to create a new account.
  • Now you're at the main screen. Click "File Transfer" which will bring you to a two-endpoint interface.
  • The endpoint for disk-based storage is named "purdue#rcac", however, you can start typing "purdue" and it will autocomplete.
  • The paths to research storage are the same as they are when you're logged into the clusters, but are provided below for reference.
    • Home directory: /~/
    • Scratch directory: /scratch/radon/m/myusername where m is the first letter of your username and myusername is your career account name.
    • Research Data Depot directory: /depot/mygroupname where mygroupname is the name of your group.
    • Fortress can be accessed at the "purdue#fortress" endpoint.
  • For the second endpoint, you can choose any other Globus endpoint, such as another research site, or a Globus Personal endpoint, which will allow you to transfer to a personal workstation or laptop.

Globus Personal Client setup:

  • On the endpoint page from earlier, click "Get Globus Connect Personal" or download it from here: Globus Connect Personal
  • Name this particular personal system and click "Generate Setup Key" on this page: Create Gloubs Personal endpoint
  • Copy the key and paste it into the setup box when installing the client for your system.
  • Your personal system is now available as an endpoint within the Globus transfer interface.

Globus Command Line:

For more information, please see Globus Support.

File Storage and Transfer File Transfer Windows Network Drive / SMB

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and Fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Note: to access Radon through SMB file sharing, you must be on a Purdue campus network or connected through VPN.

Windows:

  • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
  • Windows 8.1: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
  • In the folder location enter the following information and click Finish:

    • To access your home directory, enter \\samba.rcac.purdue.edu\myusername where myusername is your career account name.
    • To access your scratch space on Radon, enter \\samba.rcac.purdue.edu\scratch. Once mapped, you will be able to navigate to radon\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this drive mapping.
    • To access your Fortress long-term storage home directory, enter \\fortress-smb.rcac.purdue.edu\myusername where myusername is your career account name.
    • To access a shared Fortress group storage directory, enter \\fortress-smb.rcac.purdue.edu\group\mygroupname where mygroupname is the name of the shared group space.

  • You may be prompted for login information. Enter your username as onepurdue\myusername and your account password. If you forget the onepurdue prefix it will prevent you from logging in.
  • Your home, scratch, or Fortress directory should now be mounted as a drive in the Computer window.

Mac OS X:

  • In the Finder, click Go > Connect to Server
  • In the Server Address enter the following information and click Connect:

    • To access your home directory, enter smb://samba.rcac.purdue.edu/myusername where myusername is your career account name.
    • To access your scratch space on Radon, enter smb://samba.rcac.purdue.edu\scratch. Once connected, you will be able to navigate to radon\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this mount.
    • To access your Fortress long-term storage home directory, enter smb://fortress-smb.rcac.purdue.edu/myusername where myusername is your career account name.
    • To access a shared Fortress group storage directory, enter smb://fortress-smb.rcac.purdue.edu/group/mygroupname where mygroupname is the name of the shared group space.

  • You may be prompted for login information. Enter your username, password and for the domain enter onepurdue or it will prevent you from logging in.

Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you would like access via samba on the command line you may install smbclient which will give you ftp-like access and can be used as shown below. SCP or SFTP is recommended over this use case. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //samba.rcac.purdue.edu/myusername -U myusername

File Storage and Transfer File Transfer FTP / SFTP

FTP / SFTP

ITaP does not support FTP on any ITaP research systems because it does not allow for secure transmission of data. Use SFTP instead, as described below.

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

Command-line usage:

$ sftp -B buffersize myusername@radon.rcac.purdue.edu

      (to a remote system from local)
sftp> put sourcefile somedir/destinationfile
sftp> put -P sourcefile somedir/

      (from a remote system to local)
sftp> get sourcefile somedir/destinationfile
sftp> get -P sourcefile somedir/

sftp> exit
  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Linux / Solaris / AIX / HP-UX / Unix:

  • The "sftp" command line program should already be installed.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers "psftp.exe", which is an extremely small program and a basic SFTP client.
  • Secure FX is a commercial SFTP and SCP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".

Applications Environment Management with the Module Command

Provided Applications

A catalog of available software on Radon is automatically generated from a list of software currently available via the module command. The catalog is organized by software categories such as compilers, libraries, and applications broken down by field of science. You may also compare software available across all ITaP Research Computing resources and search the catalog by keywords.

Please contact rcac-help@purdue.edu if you are interested in the availability of software not shown in the catalog.

Environment Management with the Module Command

ITaP uses the module command as the preferred method to manage your processing environment. With this command, you may load applications and compilers along with their libraries and paths. Modules are packages which you load and unload as needed.

Please use the module command and do not manually configure your environment, as ITaP staff may make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.

To view a brief usage report:

$ module

List Available Modules

To see what modules are available on this system:

$ module avail

To see which versions of a specific compiler are available on this system:

$ module avail gcc
$ module avail intel

To see available modules for MPI libraries:

$ module avail openmpi 
$ module avail impi    
$ module avail mvapich2

To see available versions for specific software packages:

$ module avail abaqus
$ module avail matlab

Load / Unload a Module

All modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or you may specify which version you wish to load.

For each cluster, ITaP makes a recommendation regarding the set of compiler, math library, and MPI library for parallel code. To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

To load the default version of a specific compiler, choose one of the following commands:

$ module load gcc
$ module load intel

To load a specific version of a compiler, include the version number:

$ module load intel/13.1.1.163

When running a job, you must use the job submission file to load on the compute node(s) any relevant modules. Loading modules on the front end before submitting your job makes the software available to your session on the front-end, but not to your job submission script environment. You must load the necessary modules in your job submission script.

To unload a compiler or software package you loaded previously:

$ module unload gcc
$ module unload intel
$ module unload matlab

To unload all currently loaded modules and reset your environment:

module purge

Show Module Details

To learn more about what a module does to your environment, you may use the module show command. Here is an example showing what loading the default Matlab does to the processing environment:

$ module show matlab
----------------------------------------------------------------------------
 /opt/modules/modulefiles/matlab/R2013a:
----------------------------------------------------------------------------
whatis      invoke MATLAB Release R2013a
setenv      MATLAB "/apps/rhel6/MATLAB/R2013a"
setenv      MLROOT "/apps/rhel6/MATLAB/R2013a"
setenv      ARCH "glnxa64"
prepend_path    PATH "/apps/rhel6/MATLAB/R2013a/bin/glnxa64"
prepend_path    PATH "/apps/rhel6/MATLAB/R2013a/bin"
prepend_path    LD_LIBRARY_PATH "/apps/rhel6/MATLAB/R2013a/runtime/glnxa64"
prepend_path    LD_LIBRARY_PATH "/apps/rhel6/MATLAB/R2013a/bin/glnxa64"
help([[ matlab - Technical Computing Environment
]])

Compiling Source Code Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one processor core. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your serial program:

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort myprogram.f -o myprogram
$ gfortran myprogram.f -o myprogram
$ pgf77 myprogram.f -o myprogram
Fortran 90
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f90 -o myprogram
$ pgf90 myprogram.f90 -o myprogram
Fortran 95
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f95 -o myprogram
$ pgf95 myprogram.f95 -o myprogram
C
$ icc myprogram.c -o myprogram
$ gcc myprogram.c -o myprogram
$ pgcc myprogram.c -o myprogram
C++
$ icc myprogram.cpp -o myprogram
$ g++ myprogram.cpp -o myprogram
$ pgCC myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Compiling Source Code Compiling MPI Programs

Compiling MPI Programs

OpenMPI, MVAPICH2, and Intel MPI (IMPI) are implementations of the Message-Passing Interface (MPI) standard. Libraries for these MPI implementations and compilers for C, C++, and Fortran are available on Radon. A full list of MPI library versions installed on Radon is available in the software catalog.

MPI programs require including a header file:

Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
INCLUDE 'mpif.h'
Fortran 95
INCLUDE 'mpif.h'
C
#include <mpi.h>
C++
#include <mpi.h>

Here are a few sample programs using MPI:

To see the available MPI libraries:

$ module avail openmpi 
$ module avail mvapich2 
$ module avail impi

The following table illustrates how to compile your MPI program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.

Language Intel MPI OpenMPI , MVAPICH2 , or Intel MPI (IMPI)
Fortran 77
$ mpiifort program.f -o program
$ mpif77 program.f -o program
Fortran 90
$ mpiifort program.f90 -o program
$ mpif90 program.f90 -o program
Fortran 95
$ mpiifort program.f95 -o program
$ mpif90 program.f95 -o program
C
$ mpiicc program.c -o program
$ mpicc program.c -o program
C++
$ mpiicpc program.C -o program
$ mpiCC program.C -o program

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on the MPI libraries:

Compiling Source Code Compiling OpenMP Programs

Compiling OpenMP Programs

All compilers installed on Radon include OpenMP functionality for C, C++, and Fortran. An OpenMP program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over processor cores in a single compute node without the need for MPI communications.

OpenMP programs require including a header file:

Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
Fortran 90
use omp_lib
Fortran 95
use omp_lib
C
#include <omp.h>
C++
#include <omp.h>

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:<

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort -openmp myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran 90
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran 95
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ icc -openmp myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ icc -openmp myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on OpenMP:

Compiling Source Code Compiling Hybrid Programs

Compiling Hybrid Programs

A hybrid program combines both MPI and shared-memory to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI, MVAPICH2, and Intel MPI (IMPI) and compilers which include OpenMP for C, C++, and Fortran are available.

Hybrid programs require including header files:

Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
INCLUDE 'mpif.h'
Fortran 90
use omp_lib
INCLUDE 'mpif.h'
Fortran 95
use omp_lib
INCLUDE 'mpif.h'
C
#include <mpi.h>
#include <omp.h>
C++
#include <mpi.h>
#include <omp.h>

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

To see the available MPI libraries:

$ module avail mvapich2    
$ module avail impi    
$ module avail openmpi    

The following table illustrates how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.

Language Intel MPI OpenMPI, MVAPICH2, or Intel MPI (IMPI) with Intel Compiler
Fortran 77
$ mpiifort -openmp myprogram.f -o myprogram
$ mpif77 -openmp myprogram.f -o myprogram
Fortran 90
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
Fortran 95
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
C
$ mpiicc -openmp myprogram.c -o myprogram
$ mpicc -openmp myprogram.c -o myprogram
C++
$ mpiicpc -openmp myprogram.C -o myprogram
$ mpiCC -openmp myprogram.C -o myprogram
Language OpenMPI, MVAPICH2, or Intel MPI (IMPI) with GNU Compiler OpenMPI, MVAPICH2, or Intel MPI (IMPI) with PGI Compiler
Fortran 77
$ mpif77 -fopenmp myprogram.f -o myprogram
$ mpif77 -mp myprogram.f -o myprogram
Fortran 90
$ mpif90 -fopenmp myprogram.f90 -o myprogram
$ mpif90 -mp myprogram.f90 -o myprogram
Fortran 95
$ mpif90 -fopenmp myprogram.f95 -o myprogram
$ mpif90 -mp myprogram.f95 -o myprogram
C
$ mpicc -fopenmp myprogram.c -o myprogram
$ mpicc -mp myprogram.c -o myprogram
C++
$ mpiCC -fopenmp myprogram.C -o myprogram
$ mpiCC -mp myprogram.C -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Compiling Source Code Intel MKL Library

Intel MKL Library

Intel Math Kernel Library (MKL) contains ScaLAPACK, LAPACK, Sparse Solver, BLAS, Sparse BLAS, CBLAS, GMP, FFTs, DFTs, VSL, VML, and Interval Arithmetic routines. MKL resides in the directory stored in the environment variable MKL_HOME, after loading a version of the Intel compiler with module.

By using module load to load an Intel compiler your environment will have several variables set up to help link applications with MKL. Here are some example combinations of simplified linking options:

$ module load intel
$ echo $LINK_LAPACK
-L${MKL_HOME}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

$ echo $LINK_LAPACK95
-L${MKL_HOME}/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

ITaP recommends you use the provided variables to define MKL linking options in your compiling procedures. The Intel compiler modules also provide two other environment variables, LINK_LAPACK_STATIC and LINK_LAPACK95_STATIC that you may use if you need to link MKL statically.

ITaP recommends that you use dynamic linking of libguide. If so, define LD_LIBRARY_PATH such that you are using the correct version of libguide at run time. If you use static linking of libguide, then:

  • If you use the Intel compilers, link in the libguide version that comes with the compiler (use the -openmp option).
  • If you do not use the Intel compilers, link in the libguide version that comes with the Intel MKL above.

Here are some more documentation from other sources on the Intel MKL:

Compiling Source Code Provided Compilers

Provide Compilers on Radon

Compilers are available on Radon for Fortran, C, and C++. Compiler sets from Intel, GNU, and PGI are installed. A full list of compiler versions installed on Radon is available in the software catalog. More detailed documentation on each compiler set available on Radon follows.

On Radon, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • Intel 13.1.1.163
  • MKL
  • Intel MPI

To load the recommended set:

$ module load devel
$ module list

More information about using these compilers:

    GNU Compilers
    Intel Compilers
    PGI Compilers

Compiling Source Code Provided Compilers GNU Compilers

GNU Compilers

The official name of the GNU compilers is "GNU Compiler Collection" or "GCC". To discover which versions are available:

$ module avail gcc

Choose an appropriate GCC module and load it. For example:

$ module load gcc

An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load a newer version using the command module load gcc.

Here are some examples for the GNU compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ gfortran myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
Fortran90
$ gfortran myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
Fortran95
$ gfortran myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
C
$ gcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
C++
$ g++ myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module.

For more documentation on the GCC compilers:

Compiling Source Code Provided Compilers Intel Compilers

Intel Compilers

One or more versions of the Intel compiler are available on Radon. To discover which ones:

$ module avail intel

Choose an appropriate Intel module and load it. For example:

$ module load intel

Here are some examples for the Intel compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ ifort myprogram.f -o myprogram
$ mpiifort myprogram.f -o myprogram
$ ifort -openmp myprogram.f -o myprogram
Fortran90
$ ifort myprogram.f90 -o myprogram
$ mpiifort myprogram.f90 -o myprogram
$ ifort -openmp myprogram.f90 -o myprogram
Fortran95 (same as Fortran 90) (same as Fortran 90) (same as Fortran 90)
C
$ icc myprogram.c -o myprogram
$ mpiicc myprogram.c -o myprogram
$ icc -openmp myprogram.c -o myprogram
C++
$ icpc myprogram.cpp -o myprogram
$ mpiicpc myprogram.cpp -o myprogram
$ icpc -openmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module.

For more documentation on the Intel compilers:

Compiling Source Code Provided Compilers PGI Compilers

PGI Compilers

One or more versions of the PGI compiler are available on Radon. To discover which ones:

$ module avail pgi

Choose an appropriate PGI module and load it. For example:

$ module load pgi

Here are some examples for the PGI compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ pgf77 myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran90
$ pgf90 myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran95
$ pgf95 myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ pgcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ pgCC myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

More information on compiler options can be found in the official man pages, which are accessible with the man command after loading the appropriate compiler module.

For more documentation on the PGI compilers:

Running Jobs

Running Jobs

There is one method for submitting jobs to Radon. You may use PBS to submit jobs to a queue on Radon. PBS performs job scheduling. Jobs may be any type of program. You may use either the batch or interactive mode to run your jobs. Use the batch mode for finished programs; use the interactive mode only for debugging.

In this section, you'll find a few pages describing the basics of creating and submitting PBS jobs. As well, a number of example PBS jobs that you may be able to adapt to your own needs.

    Basics of PBS Jobs
        Job Submission Script
        Submitting a Job
        Checking Job Status
        Checking Job Output
        Holding a Job
        Job Dependencies
        Canceling a Job

    Example Jobs
        Generic PBS Jobs
            Batch
            Multiple Node
            Interactive Jobs
            Serial Jobs
            MPI
            Hybrid

        Specific Applications
            Gaussian
            Maple
            Mathematica
            Matlab
                Matlab Script (.m File)
                Implicit Parallelism
                Profile Manager
                Parallel Computing Toolbox (parfor)
                Distributed Computing Server (parallel job)

            Octave
            Perl
            Python
            R
            SAS
            Spark
                Spark




Running Jobs Basics of PBS Jobs

Basics of PBS Jobs

The Portable Batch System (PBS) is a system providing job scheduling and job management on compute clusters. With PBS, a user requests resources and submits a job to a queue. The system will then take jobs from queues, allocate the necessary nodes, and execute them.

Do NOT run large, long, multi-threaded, parallel, or CPU-intensive jobs on a front-end login host. All users share the front-end hosts, and running anything but the smallest test job will negatively impact everyone's ability to use Radon. Always use PBS to submit your work as a job.

Submitting a Job

There main steps to submitting a job are:

Follow the links below for information on these steps, and other basic information about jobs. A number of example PBS jobs are also available.

    Job Submission Script
    Submitting a Job
    Checking Job Status
    Checking Job Output
    Holding a Job
    Job Dependencies
    Canceling a Job

Running Jobs Basics of PBS Jobs Job Submission Script

Job Submission Script

To submit work to a PBS queue, you must first create a job submission file. This job submission file is essentially a simple shell script. It will set any required environment variables, load any necessary modules, create or modify files and directories, and run any applications that you need:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Loads Matlab and sets the application up
module load matlab

# Change to the directory from which you originally submitted this job.
cd $PBS_O_WORKDIR

# Runs a Matlab script named 'myscript'
matlab -nodisplay -singleCompThread -r myscript

Once your script is prepared, you are ready to submit your job.

Job Script Environment Variables

PBS sets several potentially useful environment variables which you may use within your job submission files. Here is a list of some:

Name Description
PBS_O_WORKDIR Absolute path of the current working directory when you submitted this job
PBS_JOBID Job ID number assigned to this job by the batch system
PBS_JOBNAME Job name supplied by the user
PBS_NODEFILE File containing the list of nodes assigned to this job
PBS_O_HOST Hostname of the system where you submitted this job
PBS_O_QUEUE Name of the original queue to which you submitted this job
PBS_O_SYSTEM Operating system name given by uname -s where you submitted this job
PBS_ENVIRONMENT "PBS_BATCH" if this job is a batch job, or "PBS_INTERACTIVE" if this job is an interactive job

Running Jobs Basics of PBS Jobs Submitting a Job

Submitting a Job

Once you have a job submission file, you may submit this script to PBS using the qsub command. PBS will find, or wait for, an available processor core or a set of processor cores and run your job there. At submission time, you may also optionally specify many other attributes or job requirements you have regarding where your jobs will run.

To submit your job to one compute node with no special requirements:

$ qsub myjobsubmissionfile

To submit your job to a specific queue:

$ qsub -q myqueuename myjobsubmissionfile

By default, each job receives 30 minutes of wall time for its execution. The wall time is the total time in real clock time (not CPU cycles) that you believe your job will need to run to completion. If you know that your job will not need more than a certain amount of time to run, it is very much to your advantage to request less than the maximum allowable wall time, as this may allow your job to schedule and run sooner. To request the specific wall time of 1 hour and 30 minutes:

$ qsub -l walltime=01:30:00 myjobsubmissionfile

The nodes resource indicates how many compute nodes you would like reserved for your job.

Each compute node in Radon has 16 processor cores. Detailed explanations regarding the distribution of your job across different compute nodes for parallel programs appear in the sections covering specific parallel programming libraries.

To request 2 compute nodes with 16 processor cores per node

$ qsub -l nodes=2:ppn=16 myjobsubmissionfile

To submit a job using 1 compute node with 4 processor cores:

$ qsub -l nodes=1:ppn=4,naccesspolicy=shared myjobsubmissionfile 

Please note that when naccesspolicy=singleuser is specified, the scheduler ensures that only jobs from the same are allocated on a node. So, if your singleuser jobs do not fill all the cores on a node, you would still occupy 16 cores in your queue.

If more convenient, you may also specify any command line options to qsub from within your job submission file, using a special form of comment:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

#PBS -V
#PBS -q myqueuename
#PBS -l nodes=1=ppn=1,naccesspolicy=shared 
#PBS -l walltime=01:30:00
#PBS -N myjobname

# Print the hostname of the compute node on which this job is running.
/bin/hostname

If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.

After you submit your job with qsub, it can reside in a queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the number of compute nodes requested, the amount of wall time requested, and what other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.

Once your job is submitted, you can monitor the job status, wait for the job to complete, and check the job output.

Running Jobs Basics of PBS Jobs Checking Job Status

Checking Job Status

Once a job is submitted there are several commands you can use to monitor the progress of the job.

To see yourjobs, use the qstat -u command and specify your username:

$ qstat -a -u myusername

radon-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
182792.radon-adm  myusername   workq job1        28422   1   4    --  23:00 R 20:19
185841.radon-adm  myusername   workq job2        24445   1   4    --  23:00 R 20:19
185844.radon-adm  myusername   workq job3        12999   1   4    --  23:00 R 20:18
185847.radon-adm  myusername   workq job4        13151   1   4    --  23:00 R 20:18

To retrieve useful information about your queued or running job, use the checkjob command with your job's ID number. The output should look similar to the following:

$ checkjob -v 163000

job 163000 (RM job '163000.radon-adm.rcac.purdue.edu')

AName: test
State: Idle
Creds:  user:myusername  group:mygroup  class:myqueue
WallTime:   00:00:00 of 20:00:00
SubmitTime: Wed Apr 18 09:08:37
  (Time Queued  Total: 1:24:36  Eligible: 00:00:23)

NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 2
Total Requested Nodes: 1

Req[0]  TaskCount: 2  Partition: ALL
TasksPerNode: 2  NodeCount:  1

Notification Events: JobFail

IWD:            /home/myusername/gaussian
UMask:          0000
OutputFile:     radon-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.o163000
ErrorFile:      radon-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.e163000
User Specified Partition List:   radon-adm,SHARED
Partition List: radon-adm
SrcRM:          radon>-adm  DstRM: radon-adm  DstRMJID: 163000.radon-adm.rcac.purdue.edu
Submit Args:    -l nodes=1:ppn=2,walltime=20:00:00 -q myqueue
Flags:          RESTARTABLE
Attr:           checkpoint
StartPriority:  1000
PE:             2.00
NOTE:  job violates constraints for partition radon-adm (job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160))

BLOCK MSG: job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160) (recorded at last scheduling iteration)

There are several useful bits of information in this output.

  • State lets you know if the job is Idle, Running, Completed, or Held.
  • WallTime will show how long the job has run and its maximum time.
  • SubmitTime is when the job was submitted to the cluster.
  • Total Requested Tasks is the total number of cores used for the job.
  • Total Requested Nodes and NodeCount are the number of nodes used for the job.
  • TasksPerNode is the number of cores used per node.
  • IWD is the job's working directory.
  • OutputFile and ErrorFile are the locations of stdout and stderr of the job, respectively.
  • Submit Args will show the arguments given to the qsub command.
  • NOTE/BLOCK MSG will show details on why the job isn't running. The above error says that all the cores are in use on that queue and the job has to wait. Other errors may give insight as to why the job fails to start or is held.

To view the output of a running job, use the qpeek command with your job's ID number. The -f option will continually output to the screen similar to tail -f, while qpeek without options will just output the whole file so far. Here is an example output from an application:

$ qpeek -f 1651025
TIMING: 600  CPU: 97.0045, 0.0926592/step  Wall: 97.0045, 0.0926592/step, 0.11325 hours remaining, 809.902344 MB of memory in use.
ENERGY:     600    359272.8746    280667.4810     81932.7038      5055.7519       -4509043.9946    383233.0971         0.0000         0.0000    947701.9550       -2451180.1312       298.0766  -3398882.0862  -2442581.9707       298.2890           1125.0475        77.0325  10193721.6822         3.5650         3.0569

TIMING: 800  CPU: 118.002, 0.104987/step  Wall: 118.002, 0.104987/step, 0.122485 hours remaining, 809.902344 MB of memory in use.
ENERGY:     800    360504.1138    280804.0922     82052.0878      5017.1543       -4511471.5475    383214.3057         0.0000         0.0000    946597.3980       -2453282.3958       297.7292  -3399879.7938  -2444652.9520       298.0805            978.4130        67.0123  10193578.8030        -0.1088         0.2596

TIMING: 1000  CPU: 144.765, 0.133817/step  Wall: 144.765, 0.133817/step, 0.148686 hours remaining, 809.902344 MB of memory in use.
ENERGY:    1000    361525.2450    280225.2207     81922.0613      5126.4104       -4513315.2802    383460.2355         0.0000         0.0000    947232.8722       -2453823.2352       297.9291  -3401056.1074  -2445219.8163       297.9184            823.8756        43.2552  10193174.7961        -0.7191        -0.2392
...

Running Jobs Basics of PBS Jobs Checking Job Output

Checking Job Output

Once a job is submitted, has ran to completion, and is no longer in qstat output your job is complete and ready to have its output examined.

PBS catches output written to standard output and standard error - what would be printed to your screen if you ran your program interactively. Unless you specfied otherwise, PBS will put the output in the directory where you submitted the job.

Standard out will appear in a file whose extension begins with the letter "o", for example myjobsubmissionfile.o1234, where "1234" represents the PBS job ID. Errors that occurred during the job run and written to standard error will appear in your directory in a file whose extension begins with the letter "e", for example myjobsubmissionfile.e1234.

If your program writes its own output files, those files will be created as defined by the program. This may be in the directory where the program was run, or may be defined in a configuration or input file. You will need to check the documentation for your program for more details.

Redirecting Job Output

It is possible to redirect job output to somewhere other than the default location with the -e and -o directives:

#! /bin/sh -l
#PBS -o /home/myusername/joboutput/myjob.out
#PBS -e /home/myusername/joboutput/myjob.out

# This job prints "Hello World" to output and exits
echo "Hello World"

Running Jobs Basics of PBS Jobs Holding a Job

Holding a Job

Sometimes you may want to submit a job but not have it run just yet. You may be wanting to allow labmates to cut in front of you in the queue - so hold the job until their jobs have started, and then release yours.

To place a hold on a job before it starts running, use the qhold command:

$ qhold myjobid

Once a job has started running it can not be placed on hold.

To release a hold on a job, use the qrls command:

$ qrls myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

Running Jobs Basics of PBS Jobs Job Dependencies

Job Dependencies

Dependencies are an automated way of holding and releasing jobs. Jobs with a dependency are held until the condition is satisfied. Once the condition is satisified jobs only then become eligible to run and must still queue as normal.

Job dependencies may be configured to ensure jobs start in a specified order. Jobs can be configured to run after other job state changes, such as when the job starts or the job ends.

These examples illustrate setting dependencies in several ways. Typically dependencies are set by capturing and using the job ID from the last job submitted.

To run a job after job myjobid has started:

$ qsub -W depend=after:myjobid myjobsubmissionfile

To run a job after job myjobid ends without error:

$ qsub -W depend=afterok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with errors:

$ qsub -W depend=afternotok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with or without errors:

$ qsub -W depend=afterany:myjobid myjobsubmissionfile

To set more complex dependencies on multiple jobs and conditions:

$ qsub -W depend=after:myjobid1:myjobid2:myjobid3,afterok:myjobid4 myjobsubmissionfile

Running Jobs Basics of PBS Jobs Canceling a Job

Canceling a Job

To stop a job before it finishes or remove it from a queue, use the qdel command:

$ qdel myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

Running Jobs Example Jobs

Example Jobs

A number of example jobs are available for you to look over and adapt to your own needs. The first few are generic examples, and latter ones go into specifics for particular software packages.

    Generic PBS Jobs
        Batch
        Multiple Node
        Interactive Jobs
        Serial Jobs
        MPI
        Hybrid

    Specific Applications
        Gaussian
        Maple
        Mathematica
        Matlab
            Matlab Script (.m File)
            Implicit Parallelism
            Profile Manager
            Parallel Computing Toolbox (parfor)
            Distributed Computing Server (parallel job)

        Octave
        Perl
        Python
        R
        SAS
        Spark
            Spark



Running Jobs Example Jobs Generic PBS Jobs

The following examples demonstrate the basics of PBS jobs, and are designed to cover common job request scenarios. These example jobs will need to be modified to run your application or code.

    Batch
    Multiple Node
    Interactive Jobs
    Serial Jobs
    MPI
    Hybrid

Running Jobs Example Jobs Generic PBS Jobs Batch

Batch

This simple example submits the job submission file hello.sub to the queue on and requests 4 nodes:

$ qsub -q  -l nodes=4,walltime=00:01:00 hello.sub
99.-adm.rcac.purdue.edu

Remember that ppn can not be larger than the number of processor cores on each node.

After your job finishes running, the ls command will show two new files in your directory, the .o and .e files:

$ ls -l
hello
hello.c
hello.out
hello.sub
hello.sub.e99
hello.sub.o99

If everything went well, then the file hello.sub.e99 will be empty, since it contains any error messages your program gave while running. The file hello.sub.o99 contains the output from your program.

Using Environment Variables in a Job

If you would like to see the value of the environment variables from within a PBS job, you can prepare a job submission file with an appropriate filename, here named env.sub:

#!/bin/sh -l
# FILENAME:  env.sub

# Request four nodes, 1 processor core on each.
#PBS -l nodes=4:ppn=1,walltime=00:01:00

# Change to the directory from which you submitted your job.
cd $PBS_O_WORKDIR

# Show details, especially nodes.
# The results of most of the following commands appear in the error file.
echo $PBS_O_HOST
echo $PBS_O_QUEUE
echo $PBS_O_SYSTEM
echo $PBS_O_WORKDIR
echo $PBS_ENVIRONMENT
echo $PBS_JOBID
echo $PBS_JOBNAME

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

Submit this job:

$ qsub env.sub

Running Jobs Example Jobs Generic PBS Jobs Multiple Node

Multiple Node

This section illustrates various requests for one or multiple compute nodes and ways of allocating the processor cores on these compute nodes. Each example submits a job submission file (myjobsubmissionfile.sub) to a batch session. The job submission file contains a single command cat $PBS_NODEFILE to show the names of the compute node(s) allocated. The list of compute node names indicates the geometry chosen for the job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE

All examples use the default queue of the cluster.

One processor core on any compute node

A job shares the other resources, in particular the memory, of the compute node with other jobs. This request is typical of a serial job:

$ qsub -l nodes=1 myjobsubmissionfile.sub

Compute node allocated:

-a139

Two processor cores on any compute nodes

This request is typical of a distributed-memory (MPI) job:

$ qsub -l nodes=2 myjobsubmissionfile.sub

Compute node(s) allocated:

-a139
-a138

All processor cores on one compute node

The option ppn can not be larger than the number of cores on each compute node on the machine in question. This request is typical of a shared-memory (OpenMP) job:

$ qsub -l nodes=1:ppn= myjobsubmissionfile.sub

Compute node allocated:

-a137
   

All processor cores on any two compute nodes

The option ppn can not be larger than the number of processor cores on each compute node on the machine in question. This request is typical of a hybrid (distributed-memory and shared-memory) job:

$ qsub -l nodes=2:ppn= myjobsubmissionfile.sub

Compute nodes allocated:

-a139
   -a138
   

Multinode geometry from option nodes is one processor core per node (scattered placement)

$ qsub -l nodes=8 myjobsubmissionfile.sub

-a001
-a003
-a004
-a005
-a006
-a007
-a008
-a009

Multinode geometry from option procs is one or more processor cores per node (free placement)

$ qsub -l procs=8 myjobsubmissionfile.sub

The placement of processor cores can range from all on one compute node (packed) to all on unique compute nodes (scattered). A few examples follow:

-a001
-a001
-a001
-a001
-a001
-a001
-a001
-a001

-a001
-a001
-a001
-a002
-a002
-a003
-a004
-a004

-a000
-a001
-a002
-a003
-a004
-a005
-a006
-a007

Four compute nodes, each with two processor cores

$ qsub -l nodes=4:ppn=2 myjobsubmissionfile.sub

-a001
-a001
-a003
-a003
-a004
-a004
-a005
-a005

Eight processor cores can come from any four compute nodes

$ qsub -l nodes=4 -l procs=8 myjobsubmissionfile.sub

-a001
-a001
-a003
-a003
-a004
-a004
-a005
-a005

Exclusive access to one compute node, using one processor core

Achieving this geometry requires modifying the job submission file, here named myjobsubmissionfile.sub:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE
uniq <$PBS_NODEFILE >nodefile
echo " "
cat nodefile

To gain exclusive access to a compute node, specify all processor cores that are physically available on a compute node:

$ qsub -l nodes=1:ppn= myjobsubmissionfile.sub

-a005
-a005
...
-a005

-a005

This request is typical of a serial job that needs access to all of the memory of a compute node.

Running Jobs Example Jobs Generic PBS Jobs Interactive Jobs

Interactive Jobs

Interactive jobs are run on compute nodes, while giving you a shell to interact with. They give you the ability to type commands or use a graphical interface as if you were on a front-end.

If you request an interactive job without a wall time option, PBS assigns to your job the default wall time limit for the queue to which you submit (typically 30 minutes). If this is shorter than the time you actually need, your job will terminate before completion.

To submit an interactive job with one hour of wall time, use the -I option to qsub:

$ qsub -I -l walltime=01:00:00
qsub: waiting for job 100.radon-adm.rcac.purdue.edu to start
qsub: job 100.radon-adm.rcac.purdue.edu ready

If you need to use a remote X11 display from within your job (see the ThinLinc section), add the -X option to qsub as well:

$ qsub -I -lnodes=1:ppn=16 walltime=01:00:00 -X
qsub: waiting for job 101.radon-adm.rcac.purdue.edu to start
qsub: job 101.radon-adm.rcac.purdue.edu ready

To quit your interactive job:

logout

Running Jobs Example Jobs Generic PBS Jobs Serial Jobs

Serial Jobs

This section illustrates how to use PBS to submit to a batch session one of the serial programs compiled in the section Compiling Serial Programs.

Suppose that you named your executable file serial_hello. Prepare a job submission file with an appropriate filename, here named serial_hello.sub:

#!/bin/sh -l
# FILENAME:  serial_hello.sub

module load devel
cd $PBS_O_WORKDIR

./serial_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file, or give the full path to the directory containing the executable program.

Submit the serial job to the default queue on radon and request 1 compute node with 1 processor core and 1 minute of wall time:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=00:01:00 ./serial_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
serial_hello
serial_hello.c
serial_hello.sub
serial_hello.sub.emyjobid
serial_hello.sub.omyjobid

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:radon-a139.rcac.purdue.edu   hello, world

If the job failed to run, then view error messages in the file serial_hello.sub.emyjobid.

If a serial job uses a lot of memory and finds the memory of a compute node overcommitted while sharing the compute node with other jobs, specify the number of processor cores physically available on the compute node to gain exclusive use of the compute node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 serial_hello.sub

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:radon-a139.rcac.purdue.edu   hello, world

ParaFly

ParaFly is a helper program, available through module load parafly that can be used to run multiple processes on one node by reading commands from a file. It keeps track of the commands being run and their success or failure, and keeps a specified number of CPU cores on the node busy with the commands in the file.

For instance, assume you have a file called params.txt with the following 500 lines in it:

runcommand param-1
runcommand param-2
runcommand param-3
runcommand param-4
...
runcommand param-500

You can then run ParaFly with this command:

ParaFly  -c params.txt -CPU 16 -failed_cmds rerun.txt

and ParaFly will manage the 500 'runcommand' commands, keeping 16 of them active at all times, and copying the ones that failed into a file called rerun.txt.

This gives you a way to execute many (ParaFly has been used with upwards of 10,000 commands in its command file) single-core commands in a single PBS job running on a single exclusively allocated node, rather than submitting each of them as a separate job.

So, if you have the params.txt file in the above example, you could submit the following PBS submission file:

#!/bin/bash
#PBS -q standby
#PBS -l nodes=1:ppn=16
#PBS -l walltime=2:00:00

cd $PBS_O_WORKDIR

module load parafly
ParaFly -c params.txt -CPU  -failed_cmds rerun.txt

This would run all 500 'runcommand' commands with their associated parameters on the same node, 16 at a time.

ParaFly command files are not bash scripts themselves; instead they are a list of one-line commands that are executed individually by bash. This means that each command line can use input or output redirection, or different command line options. For example:

command1 -opt1 val1 < input1 > output1
command2 -opt2 val2 < input2 > output2
command3 -opt3 val3 < input3 > output3
...
command500 -opt500 val500 < input500 > output500

Note that there is no guarantee of order of execution using ParaFly, so you cannot rely on output from one command being available as input for another.

Running Jobs Example Jobs Generic PBS Jobs MPI

MPI

An MPI (message-passing) job is a set of processes that take advantage of distributed-memory systems by communicating with each other. Work occurs across several compute nodes of a distributed-memory system. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. OpenMPI, MVAPICH2, and Intel MPI (IMPI) are implementations of the MPI standard.

This section illustrates how to use PBS to submit to a batch session one of the MPI programs compiled in the section Compiling MPI Programs.

Use module load to set up the paths to access these libraries. Use module avail to see all MPI packages installed on Radon.

Suppose that you named your executable file mpi_hello. Prepare a job submission file with an appropriate filename, here named mpi_hello.sub:

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR

mpiexec -n 32 ./mpi_hello

Here, the devel module loads the recommended MPI/compiler combination on Radon.

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission, or give the full path to the directory containing the executable program.

You invoke an MPI program with the mpiexec command. The number of processes is requested with the -n option and is typically equal to the total number of processor cores you request from PBS (more on this below).

Submit the MPI job to the default queue on Radon and request 2 whole compute nodes and 16 MPI ranks on each compute node and 1 minute of wall time.

$ qsub -l nodes=2:ppn=32,walltime=00:01:00 ./mpi_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
mpi_hello
mpi_hello.c
mpi_hello.sub
mpi_hello.sub.emyjobid
mpi_hello.sub.omyjobid

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:radon-a010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
Runhost:radon-a010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
...
Runhost:radon-a011.rcac.purdue.edu   Rank:16 of 32 ranks   hello, world
Runhost:radon-a011.rcac.purdue.edu   Rank:17 of 32 ranks   hello, world
...

If the job failed to run, then view error messages in the file mpi_hello.sub.emyjobid.

If an MPI job uses a lot of memory and 16 MPI ranks per compute node use all of the memory of the compute nodes, request more compute nodes, while keeping the total number of MPI ranks unchanged.

Submit the job with double the number of compute nodes and modify the node list to halve the number of MPI ranks per compute node (the total number of MPI ranks remains unchanged):

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR

# select every 2nd line #
awk 'NR%2 != 0' < $PBS_NODEFILE > nodefile

mpiexec -n 32 -machinefile ./nodefile ./mpi_hello
$ qsub -l nodes=4:ppn=16,walltime=00:01:00 ./mpi_hello.sub

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:radon-a010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
Runhost:radon-a010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
...
Runhost:radon-a011.rcac.purdue.edu   Rank:8 of 32 ranks   hello, world
...
Runhost:radon-a012.rcac.purdue.edu   Rank:16 of 32 ranks   hello, world
...
Runhost:radon-a013.rcac.purdue.edu   Rank:24 of 32 ranks   hello, world
...

Notes

  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Radon is "workq".
  • Invoking an MPI program on Radon with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke an MPI program.
  • In general, the exact order in which MPI ranks output similar write requests to an output file is random.

For an introductory tutorial on how to write your own MPI programs:

Running Jobs Example Jobs Generic PBS Jobs Hybrid

Hyrbrid

A hybrid job combines both MPI and OpenMP attributes to take advantage of distributed-memory systems with multi-core processors. Work occurs across several compute nodes of a distributed-memory system and across the processor cores of the multi-core processors.

This section illustrates how to use PBS to submit a hybrid program compiled in the section Compiling Hybrid Programs.

The path to relevant MPI libraries is not set up on any compute node by default. Using module load is the way to access these libraries. Use module avail to see all MPI packages installed on Radon.

To run a hybrid program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS 16

In bash:

$ export OMP_NUM_THREADS=16

Suppose that you named your executable file hybrid_hello. Prepare a job submission file with an appropriate filename, here named hybrid_hello.sub:

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS={resource.nodecores}
mpiexec -n 2 -machinefile nodefile ./hybrid_hello

Here, the devel module loads the recommended MPI/compiler combination on Radon.

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file, or give the full path to the directory containing the executable program.

You invoke a hybrid program with the mpiexec command. You may need to specify how to place the threads on the compute node. Several examples on how to specify thread placement with various MPI libraries are at the bottom of this section.

Submit the hybrid job to the default queue on Radon and request 2 whole compute nodes with 1 MPI rank on each compute node (each using all 16 cores as OpenMP threads) and 1 minute of wall time.

$ qsub -l nodes=2:ppn=16,walltime=00:01:00 hybrid_hello.sub
179168.radon-adm.rcac.purdue.edu

View two new files in your directory (.o and .e):

$ ls -l
hybrid_hello
hybrid_hello.c
hybrid_hello.sub
hybrid_hello.sub.emyjobid
hybrid_hello.sub.omyjobid

View the results from one of the sample hybrid programs about task parallelism:

$ cat hybrid_hello.sub.omyjobid
SERIAL REGION:     Runhost:-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:-a044.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
PARALLEL REGION:   Runhost:-a044.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
   ...
PARALLEL REGION:   Runhost:-a045.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
PARALLEL REGION:   Runhost:-a045.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
   ...

If the job failed to run, then view error messages in the file hybrid_hello.sub.emyjobid.

If a hybrid job uses a lot of memory and 16 OpenMP threads per compute node uses all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores (OpenMP threads) on each compute node.

Prepare a job submission file with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS=8
mpiexec -n 4 -machinefile nodefile ./hybrid_hello

Submit the job with double the number of compute nodes (MPI ranks). Be sure to request the whole node or other jobs may use the extra memory your job requires.

$ qsub -l nodes=4:ppn=16,walltime=00:01:00 hybrid_hello.sub

View the results from one of the sample hybrid programs about task parallelism with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

$ cat hybrid_hello.sub.omyjobid
SERIAL REGION:     Runhost:-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:-a044.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:-a044.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:-a045.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:-a045.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:-a046.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:-a046.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:-a047.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:-a047.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...

Thread placement

Compute nodes are made up of two or more processor chips, or sockets. Typically each socket shares a memory controller and communication busses for all of its cores. Consider these cores as having "shortcuts" to each other. Cores within a socket will be able to communicate faster and more efficiently amongst themselves than with another socket or compute node. MPI ranks should consequently be placed so that they can utilize these "shortcuts". When running hybrid codes it is essential to specify this placement as by default some MPI libraries will limit a rank to a single core or may scatter a rank across processor chips.

Below are examples on how to specify this placement with several MPI libraries. Hybrid codes should be run within jobs requesting the entire node by either using ppn=16 or the -n exclusive flag or the job may result in unexpected and poor thread placement.

OpenMPI 1.6.3

mpiexec -cpus-per-rank $OMP_NUM_THREADS --bycore -np 2 -machinefile nodefile ./hybrid_loop

OpenMPI 1.8

mpiexec -map-by socket:pe=$OMP_NUM_THREADS -np 2 -machinefile nodefile ./hybrid_loop

Intel MPI

mpiexec -np 2 -machinefile nodefile ./hybrid_loop

MVAPICH2

mpiexec -env MV2_ENABLE_AFFINITY 0 -np 2 -machinefile nodefile ./hybrid_loop

Notes

  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Radon is "workq".
  • Invoking a hybrid program on Radon with ./program is typically wrong, since this will use only one MPI process and defeats the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke a hybrid program.
  • In general, the exact order in which MPI processes of a hybrid program output similar write requests to an output file is random.

Running Jobs Example Jobs Specific Applications

The following examples demonstrate job submission files for some common real-world applications. See the Generic PBS Examples section for more examples on job submissions that can be adapted for use.

    Gaussian
    Maple
    Mathematica
    Matlab
        Matlab Script (.m File)
        Implicit Parallelism
        Profile Manager
        Parallel Computing Toolbox (parfor)
        Distributed Computing Server (parallel job)

    Octave
    Perl
    Python
    R
    SAS
    Spark
        Spark


Running Jobs Example Jobs Specific Applications Gaussian

Gaussian

Gaussian is a computational chemistry software package which works on electronic structure. This section illustrates how to submit a small Gaussian job to a PBS queue. This Gaussian example runs the Fletcher-Powell multivariable optimization.

Prepare a Gaussian input file with an appropriate filename, here named myjob.com. The final blank line is necessary:

#P TEST OPT=FP STO-3G OPTCYC=2

STO-3G FLETCHER-POWELL OPTIMIZATION OF WATER

0 1
O
H 1 R
H 1 R 2 A

R 0.96
A 104.

To submit this job, load Gaussian then run the provided script, named subg09. This job uses one compute node with 16 processor cores:

$ module load gaussian09
$ subg09 myjob -l nodes=1:ppn=16

View job status:

$ qstat -u myusername

View results in the file for Gaussian output, here named myjob.log. Only the first and last few lines appear here:

 
Entering Gaussian System, Link 0=/apps/rhel6/g09-D.01/g09/g09
 Initial command:
 /apps/rhel6/g09-D.01/g09/l1.exe /scratch/radon/m/myusername/gaussian/Gau-7781.inp -scrdir=/scratch/radon/m/myusername/gaussian/
 Entering Link 1 = /apps/rhel6/g09-D.01/g09/l1.exe PID=      7782.

 Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2010,
            Gaussian, Inc.  All Rights Reserved.

.
.
.

 Job cpu time:  0 days  0 hours  1 minutes 37.3 seconds.
 File lengths (MBytes):  RWF=      5 Int=      0 D2E=      0 Chk=      1 Scr=      1
 Normal termination of Gaussian 09 at Wed Mar 30 10:49:02 2011.
real 17.11
user 92.40
sys 4.97
Machine:
radon-a389
radon-a389
radon-a389
radon-a389
radon-a389
radon-a389
radon-a389
radon-a389

Examples of Gaussian PBS Job Submissions

Submit job using 16 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=16,walltime=200:00:00 -q myqueuename

Submit job using 16 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=16,walltime=200:00:00 -q myqueuename

For more information about Gaussian:

Running Jobs Example Jobs Specific Applications Maple

Maple

Maple is a general-purpose computer algebra system. This section illustrates how to submit a small Maple job to a PBS queue. This Maple example differentiates, integrates, and finds the roots of polynomials.

Prepare a Maple input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple
cd $PBS_O_WORKDIR

# Use the -q option to suppress startup messages.
# maple -q myjob.in
maple myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=16 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

                                         2
                                      6 x

                                                           2
                      2                                   x
                  -3 x  cos(x) + 6 cos(x) + 6 x sin(x) + ----
                                                          2

                                    1/3, -1

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Maple:

Running Jobs Example Jobs Specific Applications Mathematica

Mathematica

Mathematica implements numeric and symbolic mathematics. This section illustrates how to submit a small Mathematica job to a PBS queue. This Mathematica example finds the three roots of a third-degree polynomial.

Prepare a Mathematica input file with an appropriate filename, here named myjob.in:

(* FILENAME:  myjob.in *)

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
cd $PBS_O_WORKDIR

math < myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=16 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Mathematica 5.2 for Linux x86 (64 bit)
Copyright 1988-2005 Wolfram Research, Inc.
 -- Terminal graphics initialized --

In[1]:=
In[2]:=
In[2]:=
In[3]:=
                     2    3
Out[3]= 1 + 3 x + 3 x  + x

In[4]:=
Out[4]= {{x -> -1}, {x -> -1}, {x -> -1}}

In[5]:=

View the standard error file, myjob.sub.emyjobid:

rmdir: ./ligo/rengel/tasks: Directory not empty
rmdir: ./ligo/rengel: Directory not empty
rmdir: ./ligo: Directory not empty

For more information about Mathematica:

Running Jobs Example Jobs Specific Applications Matlab

MATLAB® (MATrix LABoratory) is a high-level language and interactive environment for numerical computation, visualization, and programming. MATLAB is a product of MathWorks.

MATLAB, Simulink, Compiler, and several of the optional toolboxes are available to faculty, staff, and students. To see the kind and quantity of all MATLAB licenses plus the number that you are currently using you can use the matlab_licenses command:

$ module load matlab
$ matlab_licenses

The MATLAB client can be run in the front-end for application development, however, computationally intensive jobs must be run on compute nodes.

The following sections provide several examples illustrating how to submit MATLAB jobs to a Linux compute cluster.

    Matlab Script (.m File)
    Implicit Parallelism
    Profile Manager
    Parallel Computing Toolbox (parfor)
    Distributed Computing Server (parallel job)

Running Jobs Example Jobs Specific Applications Matlab Matlab Script (.m File)

Matlab Script (.m File)

This section illustrates how to submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and gets three random numbers.

Prepare a MATLAB script myscript.m, and a MATLAB function file myfunction.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name);

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;
% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Also, prepare a job submission file, here named myjob.sub. Run with the name of the script:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"

# Load module, and set up environment for Matlab to run
module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
# Run Matlab, with the above options and specifying our .m file
matlab -nodisplay -singleCompThread -r myscript

Submit the job as a single compute node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

radon-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
197986.radon-adm  myusername   workq myjob.sub    4645   1   1    --  00:01 R 00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
radon-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

hostname:radon-a001.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows that a processor core on one compute node (radon-a001) processed the job. Output also displays the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about MATLAB:

Running Jobs Example Jobs Specific Applications Matlab Implicit Parallelism

Implicit Parallelism

MATLAB implements implicit parallelism which is automatic multithreading of many computations, such as matrix multiplication, linear algebra, and performing the same operation on a set of numbers. This is different from the explicit parallelism of the Parallel Computing Toolbox.

MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. Since these processor cores, or threads, share a common memory, many MATLAB functions contain multithreading potential. Vector operations, the particular application or algorithm, and the amount of computation (array size) contribute to the determination of whether a function runs serially or with multithreading.

When your job triggers implicit parallelism, it attempts to allocate its threads on all processor cores of the compute node on which the MATLAB client is running, including processor cores running other jobs. This competition can degrade the performance of all jobs running on the node.

When you know that you are coding a serial job but are unsure whether you are using thread-parallel enabled operations, run MATLAB with implicit parallelism turned off. Beginning with the R2009b, you can turn multithreading off by starting MATLAB with -singleCompThread:

$ matlab -nodisplay -singleCompThread -r mymatlabprogram

When you are using implicit parallelism, request exclusive access to a compute node by requesting all cores which are physically available on a node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 myjob.sub

For more information about MATLAB's implicit parallelism:

Running Jobs Example Jobs Specific Applications Matlab Profile Manager

Profile Manager

MATLAB offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. The 'local' profile runs a MATLAB job on the processor core(s) of the same compute node, or front-end, that is running the client. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager.

To prepare a user-defined cluster profile, use the Cluster Profile Manager in the Parallel menu. This profile contains the PBS details (queue, nodes, ppn, walltime, etc.) of your job submission. Ultimately, your cluster profile will be an argument to MATLAB functions like batch().

For your convenience, ITaP provides a generic cluster profile that can be downloaded: mypbsprofile.settings

To import the profile, start a MATLAB session and select Manage Cluster Profiles... from the Parallel menu. In the Cluster Profile Manager, select Import, navigate to the folder containing the profile, select mypbsprofile.settings and click OK. Remember that the profile will need to be customized for your specific needs. If you have any questions, please contact us.

For detailed information about MATLAB's Parallel Computing Toolbox, examples, demos, and tutorials:

Running Jobs Example Jobs Specific Applications Matlab Parallel Computing Toolbox (parfor)

Parallel Computing Toolbox (parfor)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. It offers a shared-memory computing environment with a maximum of 12 workers (labs, threads; starting in version R2011a) running on the local cluster profile in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses.

This section illustrates the fine-grained parallelism of a parallel for loop (parfor) in a pool job.

The following examples illustrate a method for submitting a small, parallel, MATLAB program with a parallel loop (parfor statement) as a job to a PBS queue. This MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each iteration of the parfor loop.

This method uses the qsub command to submit a MATLAB client which calls the MATLAB batch() function with a user-defined cluster profile.

Prepare a MATLAB pool program in a MATLAB script with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;        % get elapsed time in parallel loop
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)

The execution of a pool job starts with a worker executing the statements of the first serial region up to the parfor block, when it pauses. A set of workers (the pool) executes the parfor block. When they finish, the first worker resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

Prepare a MATLAB script that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Matlabpool',4,'Profile','mypbsprofile','CaptureDiary',true);
pjob.wait;
pjob.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=01:00:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m.

Once this job starts, a second job submission is made.

View job status:

$ qstat -u myusername

radon-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
199025.radon-adm  myusername   workq myjob.sub   30197   1   1    --  01:00:00 R 00:00:00
199026.radon-adm  myusername   workq Job1          668   4   4    --  01:00:00 R 00:00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
radon-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013a (8.1.0.604) 64-bit (glnxa64)
                             February 15, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclbatch.m
radon-a000.rcac.purdue.edu
SERIAL REGION:  hostname:radon-a000.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  radon-a001.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  radon-a002.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  radon-a001.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  radon-a002.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  radon-a003.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  radon-a003.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  radon-a004.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  radon-a004.rcac.purdue.edu            4         1          8

SERIAL REGION:  hostname:radon-a000.rcac.purdue.edu
Elapsed time in parallel loop:   5.411486

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

For more information about MATLAB Parallel Computing Toolbox:

Running Jobs Example Jobs Specific Applications Matlab Distributed Computing Server (parallel job)

Distributed Computing Server (parallel job)

The MATLAB Parallel Computing Toolbox (PCT) enables a parallel job via the MATLAB Distributed Computing Server (DCS). The tasks of a parallel job are identical, run simultaneously on several MATLAB workers (labs), and communicate with each other. This section illustrates an MPI-like program.

This section illustrates how to submit a small, MATLAB parallel job with four workers running one MPI-like task to a PBS queue. The MATLAB program broadcasts an integer to four workers and gathers the names of the compute nodes running the workers and the lab IDs of the workers.

This example uses the PBS qsub command to submit a Matlab script with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the parallel job. This job is completely off the front end.

Prepare a MATLAB script named myscript.m :

% FILENAME:  myscript.m

% Specify pool size.
% Convert the parallel job to a pool job.
matlabpool open 4;
spmd

if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if labindex == 1
    disp(result)
end

end   % spmd
matlabpool close force;
quit;

Also, prepare a job submission, here named myjob.sub. Run with the name of the script:

# FILENAME:  myjob.sub

echo "myjob.sub"

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

Once this job starts, a second job submission is made.

View job status:

$ qstat -u myusername

radon-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
465534.radon-adm myusername   workq myjob.sub    5620   1   1    --  00:05 R 00:00
465545.radon-adm myusername   workq Job2          --    4   4    --  00:01 R   --

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
radon-a006.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
Lab 1:
  radon-a006.rcac.purdue.edu:4:1:1000
  radon-a007.rcac.purdue.edu:4:2:1000
  radon-a008.rcac.purdue.edu:4:3:1000
  radon-a009.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of one compute node (a006) that processed the job submission file myjob.sub. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a006,a007,a008,a009) that processed the four parallel regions.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

For more information about parallel jobs:

Running Jobs Example Jobs Specific Applications Octave

Octave

GNU Octave is a high-level, interpreted, programming language for numerical computations. Octave is a structured language (similar to C) and mostly compatible with MATLAB. You may use Octave to avoid the need for a MATLAB license, both during development and as a deployed application. By doing so, you may be able to run your application on more systems or more easily distribute it to others.

This section illustrates how to submit a small Octave job to a PBS queue. This Octave example computes the inverse of a matrix.

Prepare an Octave script file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

The command octave myjob.m (without the redirection) also works in the preceding script.

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

A =

   1   2   3
   4   5   6
   7   8   0

ans =

  -1.77778   0.88889  -0.11111
   1.55556  -0.77778   0.22222
  -0.11111   0.22222  -0.11111

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Octave:

Running Jobs Example Jobs Specific Applications Perl

Perl

Perl is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Perl job to a PBS queue. This Perl example prints a single line of text.

Prepare a Perl input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

print "hello, world\n"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -w option to issue warnings.
/usr/bin/perl -w myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Perl:

Running Jobs Example Jobs Specific Applications Python

Python

Python is a high-level, general-purpose, interpreted, dynamic programming language. We suggest using Anaconda which is a Python distribution made for large-scale data processing, predictive analytics, and scientific computing. This section illustrates how to submit a small Python job to a PBS queue. This Python example prints a single line of text.

Prepare a Python input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

import string, sys
print "hello, world"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load anaconda
cd $PBS_O_WORKDIR
unset DISPLAY

python myjob.in

Submit the job:

$ qsub -l nodes=1,ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

Installing packages

If you would like to install a python package for your own personal use, you may do so by following these directions. Make sure you have a download link to the software you want to use and substitute it on the wget line.

$ mkdir ~/src
$ cd ~/src
$ wget http://path/to/source/tarball/app-1.0.tar.gz
$ tar xzvf app-1.0.tar.gz
$ cd app-1.0
$ module load anaconda
$ python setup.py install --user
$ cd ~
$ python
>>> import app
>>> quit()

The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

For a list of modules currently installed in the anaconda python distribution:

$ module load anaconda
$ conda list
# packages in environment at /apps/rhel6/Anaconda-2.0.1:
#
_license                  1.1                      py27_0
anaconda                  2.0.1                np18py27_0
...

If any other python modules are needed please contact us.

For more information about Python:

Running Jobs Example Jobs Specific Applications R

R

R, a GNU project, is a language and environment for statistics and graphics. It is an open source version of the S programming language. This section illustrates how to submit a small R job to a PBS queue. This R example computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load r
cd $PBS_O_WORKDIR

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # FILENAME:  myjob.in
>
> # Compute a Pythagorean triple.
> a = 3
> b = 4
> c = sqrt(a*a + b*b)
> c     # display result
[1] 5
>

Any output written to standard error will appear in myjob.sub.emyjobid.

Installing Packages

To install additional R packages, create a folder in your home directory called Rlibs. You will need to be running a recent version of R (2.14.0 or greater as of this writing):

$ mkdir ~/Rlibs

If you are running the bash shell (the default on our clusters), add the following line to your .bashrc (Create the file ~/.bashrc if it doesn't already exist. You may also need to run "ln -s .bashrc .bash_profile" if .bash_profile doesn't exist either):

export R_LIBS=~/Rlibs:$R_LIBS

If you are running csh or tcsh, add the following to your .cshrc:

setenv R_LIBS ~/Rlibs:$R_LIBS

Now run "source .bashrc" and start R:

$ module load r
$ R
> .libPaths()
[1] "/home/myusername/Rlibs"
[2] "/apps/rhel6/R/3.1.0/lib64/R/library"

.libPaths() should output something similar to above if it is set up correctly. Now let's try installing a package.

> install.packages('packagename',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN")

The above command should download and install the requested R package, which upon completion can then be loaded.

> library('packagename')

If your R package relies on a library that's only installed as a module (for this example we'll use GDAL), you can install it by doing the following:

$ module load gdal
$ module load r
$ R
> install.packages('rgdal',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN", configure.args="--with-gdal-include=$GDAL_HOME/include
--with-gdal-lib=$GDAL_HOME/lib"))
Repeat install.packages(...) for any packages that you need. Your R packages should now be installed. For more information about R:

Running Jobs Example Jobs Specific Applications SAS

SAS

SAS is an integrated system supporting statistical analysis, report generation, business planning, and forecasting. This section illustrates how to submit a small SAS job to a PBS queue. This SAS example displays a small dataset.

Prepare a SAS input file with an appropriate filename, here named myjob.sas:

* FILENAME:  myjob.sas

/* Display a small dataset. */
TITLE 'Display a Small Dataset';
DATA grades;
INPUT name $ midterm final;
DATALINES;
Anne     61 64
Bob      71 71
Carla    86 80
David    79 77
Eduardo  73 73
Fannie   81 81
;
PROC PRINT data=grades;
RUN;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load sas
cd $PBS_O_WORKDIR

# -stdio:   run SAS in batch mode:
#              read SAS input from stdin
#              write SAS output to stdout
#              write SAS log to stderr
# -nonews:  do not display SAS news
# SAS runs in batch mode when the name of the SAS command file
# appears as a command-line argument.
sas -stdio -nonews myjob

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

                                                           The SAS System                       10:59 Wednesday, January 5, 2011   1

                                                 Obs    name       midterm    final

                                                  1     Anne          61        64
                                                  2     Bob           71        71
                                                  3     Carla         86        80
                                                  4     David         79        77
                                                  5     Edwardo       73        73
                                                  6     Fannie        81        81

View the SAS log in the standard error file, myjob.sub.emyjobid:

1                                                          The SAS System                           12:32 Saturday, January 29, 2011

NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.2 (TS2M0)
      Licensed to PURDUE UNIVERSITY - T&R, Site 70063312.
NOTE: This session is executing on the Linux 2.6.18-194.17.1.el5rcac2 (LINUX) platform.

NOTE: SAS initialization used:
      real time           0.70 seconds
      cpu time            0.03 seconds

1          * FILENAME:  myjob.sas
2
3          /* Display a small dataset. */
4          TITLE 'Display a Small Dataset';
5          DATA grades;
6          INPUT name $ midterm final;
7          DATALINES;

NOTE: The data set WORK.GRADES has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.18 seconds
      cpu time            0.01 seconds

14         ;
15         PROC PRINT data=grades;
16         RUN;

NOTE: There were 6 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.32 seconds
      cpu time            0.04 seconds

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           1.28 seconds
      cpu time            0.08 seconds
For more information about SAS:

Running Jobs Example Jobs Specific Applications Spark Spark

Spark

Spark is a fast and general engine for large-scale data processing. This section walks through how to submit and run a Spark job using PBS on the compute nodes of Radon.

pbs-spark-submit launches an Apache Spark program within a PBS job, including starting the Spark master and worker processes in standalone mode, running a user supplied Spark job, and stopping the Spark master and worker processes. The Spark program and its associated services will be constrained by the resource limits of the job and will be killed off when the job ends. This effectively allows PBS to act as a Spark cluster manager. The following steps assume that you have a Spark program that can run without errors. To use Spark and pbs-spark-submit, you need to load the following two modules to setup SPARK_HOME and PBS_SPARK_HOME environment variables.
module load spark
module load pbs-spark-submit
The following example submission script serves as a template to build your customized, more complex Spark job submission. This job requests 2 whole compute nodes for 10 minutes, and submits to the default queue.
#PBS -N spark-pi
#PBS -l nodes=2:ppn=16

#PBS -l walltime=00:10:00
#PBS -q workq
#PBS -o spark-pi.out
#PBS -e spark-pi.err

cd $PBS_O_WORKDIR
module load spark
module load pbs-spark-submit
pbs-spark-submit $SPARK_HOME/examples/src/main/python/pi.py 1000
In the submission script above, this command submits the pi.py program to the nodes that are allocated to your job.
pbs-spark-submit $SPARK_HOME/examples/src/main/python/pi.py 1000
You can set various environment variables in your submission script to change the setting of Spark program. For example, the following line sets the SPARK_LOG_DIR to $HOME/log. The default value is current working directory.
export SPARK_LOG_DIR=$HOME/log
The same environment variables can be set via the pbs-spark-submit command line argument. For example, the following line sets the SPARK_LOG_DIR to $HOME/log2.
pbs-spark-submit --log-dir $HOME/log2
The following table summarizes the environment variables that can be set. Please note that setting them from the command line arguments overwrites the ones that are set via shell export. Setting them from shell export overwrites the system default values.
Environment Variable Default Shell Export Command Line Args
SPAKR_CONF_DIR $SPARK_HOME/conf export SPARK_CONF_DIR=$HOME/conf --conf-dir or -C
SPAKR_LOG_DIR Current Working Directory export SPARK_LOG_DIR=$HOME/log --log-dir or -L
SPAKR_LOCAL_DIR /tmp export SPARK_LOCAL_DIR=$RCAC_SCRATCH/local NA
SCRATCHDIR Current Working Directory export SCRATCHDIR=$RCAC_SCRATCH/scratch --work-dir or -d
SPARK_MASTER_PORT 7077 export SPARK_MASTER_PORT=7078 NA
SPARK_DAEMON_JAVA_OPTS None export SPARK_DAEMON_JAVA_OPTS="-Dkey=value" -D key=value
Note that SCRATCHDIR must be a shared scratch directory across all nodes of a job. In addition, pbs-spark-submit supports command line arguments to change the properties of the Spark daemons and the Spark jobs. For example, the --no-stop argument tells Spark to not stop the master and worker daemons after the Spark application is finished, and the --no-init argument tells Spark to not initialize the Spark master and worker processes. This is intended for use in a sequence of invocations of Spark programs within the same job.
pbs-spark-submit --no-stop   $SPARK_HOME/examples/src/main/python/pi.py 800
pbs-spark-submit --no-init   $SPARK_HOME/examples/src/main/python/pi.py 1000
Use the following command to see the complete list of command line arguments.
pbs-spark-submit -h
To learn programming in Spark, refer to Spark Programming Guide To learn submitting Spark applications, refer to Submitting Applications

Common error messages bash: command not found

Problem

You receive the following message after typing a command

bash: command not found

Solution

This means the system doesn't know how to find your command. Typically, you need to load a module to do it.

Common error messages qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Problem

You receive the following message after attempting to delete a job with the 'qdel' command

qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Solution

This error usually indicates that at least one node running your job has stopped responding or crashed. Please forward the job ID to rcac-help@purdue.edu, and ITaP Research Computing staff can help remove the job from the queue.

Common error messages /usr/bin/xauth: error in locking authority file

Problem

I receive this message when logging in:

/usr/bin/xauth: error in locking authority file

Solution

Your home directory disk quota is full. You may check your quota with myquota.

You will need to free up space in your home directory.

Common Questions What is the "debug" queue?

What is the "debug" queue?

The debug queue allows you to quickly start small, short, interactive jobs in order to debug code, test programs, or test configurations. You are limited to one running job at a time in the queue, and you may run up to two compute nodes for 30 minutes.

Common Questions How can my collaborators outside Purdue get access to Radon?

How can my collaborators outside Purdue get access to Radon?

Your Departmental Business Office can submit a Request for Privileges (R4P) to provide access to collaborators outside Purdue, including recent graduates. Instructions are available and the request can be made on the request form. Once the R4P process is complete, you will need to add your outside collaborators to Radon as you would any for any Purdue collaborator.

Common Questions Do I need to do anything to my firewall to access Radon?

Do I need to do anything to my firewall to access Radon?

No firewall changes are needed to access Radon. However, to access data through Network Drives (i.e., CIFS, "Z: Drive"), you must be on a Purdue campus network or connected through VPN.