Carter User Guide

Get Help
Collapse Topics

    Overview of Carter
        Overview of Carter

    Accounts
        Logging In
            Passwords
            SSH Client Software
            SSH Keys
            ThinLinc
            SSH X11 Forwarding

        Purchasing Nodes

    File Storage and Transfer
        Archive and Compression
        Environment Variables
        Storage Options
            Home Directory
            Long-Term Storage
            Scratch Space
            /tmp Directory

        Storage Quota / Limits
        Sharing Data
        File Transfer
            SCP
            Globus
            Windows Network Drive / SMB
            FTP / SFTP


    Applications
        Environment Management with the Module Command

    Compiling Source Code
        Compiling Serial Programs
        Compiling MPI Programs
        Compiling OpenMP Programs
        Compiling Hybrid Programs
        Compiling GPU Programs
        Intel MKL Library
        Provided Compilers
            GNU Compilers
            Intel Compilers


    Running Jobs
        Basics of PBS Jobs
            Job Submission Script
            Submitting a Job
            Checking Job Status
            Checking Job Output
            Holding a Job
            Job Dependencies
            Canceling a Job
            Node Access Policies
            Queues

        Example Jobs
            Generic PBS Jobs
                Batch
                Multiple Node
                Specific Types of Nodes
                Interactive Jobs
                Serial Jobs
                MPI
                OpenMP
                Hybrid
                GPU

            Specific Applications
                Gaussian
                Maple
                Mathematica
                Matlab
                    Matlab Script (.m File)
                    Implicit Parallelism
                    Profile Manager
                    Parallel Computing Toolbox (parfor)
                    Parallel Toolbox (spmd)
                    Distributed Computing Server (parallel job)

                Octave
                Perl
                Python
                R
                SAS
                Singularity
                Spark
                    Spark

                Tensorflow on Carter
                Windows



    Common Error Messages
        cannot connect to X server
        E233: cannot open display
        How do I check my job output while it is running
        bash: command not found
        qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu
        bash: module command not found
        /usr/bin/xauth: error in locking authority file
        1234.carter-adm.rcac.purdue.edu.SC: line 12: 12345 Killed
        My SSH connection hangs

    Common Questions
        What is the "debug" queue?
        How can my collaborators outside Purdue get access to Carter?
        How can I get email alerts about my PBS job status?
        Can I extend the walltime on a PBS job?
        Do I need to do anything to my firewall to access Carter?
        My scratch files were purged. Can I retrieve them?
        How can I get access to Sentaurus software?
        Can I share data with outside collaborators?
        Can I get a private server from RCAC?

    Biography of Dennis Lee Carter
        Overview of Dennis Lee Carter


path breadcrumb divider Overview of Carter path breadcrumb divider Overview of Carter

Overview of Carter

Carter was launched through an ITaP partnership with Intel in November 2011 and was a member of Purdue's Community Cluster Program. Carter consisted of HP compute nodes with two 8-core Intel Xeon-E5 processors (16 cores per node) and between 32 GB and 256 GB of memory. A few NVIDIA GPU-accelerated nodes were also available. All nodes had 56 Gbps FDR Infiniband connections and a 5-year warranty. Carter was decommissioned on April 30, 2017.

Carter Namesake

Carter was named in honor of Dennis Lee Carter, Purdue alumnus and creator of the \"Intel Inside\" campaign. More information about his life and impact on Purdue is available in an ITaP Biography of Carter.

Carter Detailed Hardware Specification

All Carter nodes had 16 processor cores, between 32 GB and 256 GB RAM, and 56 Gbps Infiniband interconnects. Carter-G nodes were each equipped with three NVIDIA Tesla GPUs.
Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect TeraFLOPS
Carter-A 556 Two 8-Core Intel Xeon-E5 16 32 GB 56 Gbps FDR Infiniband 165.6
Carter-B 80 Two 8-Core Intel Xeon-E5 16 64 GB 56 Gbps FDR Infiniband 20.1
Carter-C 12 Two 8-Core Intel Xeon-E5 16 256 GB 56 Gbps FDR Infiniband 0.6
Carter-G 12 Two 8-Core Intel Xeon-E5 + Three NVIDIA Tesla M2090 GPUs 16 128 GB 56 Gbps FDR Infiniband n/a

path breadcrumb divider Accounts

Accounts on Carter

Obtaining an Account

To obtain an account, you must be part of a research group which has purchased access to Carter. Refer to the Accounts / Access page for more details on how to request access.

Outside Collaborators

A valid Purdue Career Account is required for access to any resource. If you do not currently have a valid Purdue Career Account you must have a current Purdue faculty or staff member file a Request for Privileges (R4P) with their Departmental Business Office before you can proceed.

More Accounts Information

    Logging In
        Passwords
        SSH Client Software
        SSH Keys
        ThinLinc
        SSH X11 Forwarding

    Purchasing Nodes

path breadcrumb divider Accounts path breadcrumb divider Logging In

Logging In

To submit jobs on Carter, log in to the submission host carter.rcac.purdue.edu via SSH. This submission host is actually 4 front-end hosts: carter-fe00 through carter-fe03. The login process randomly assigns one of these front-ends to each login to carter.rcac.purdue.edu.

    Passwords
    SSH Client Software
    SSH Keys
    ThinLinc
    SSH X11 Forwarding

path breadcrumb divider Accounts path breadcrumb divider Logging In path breadcrumb divider Passwords

Passwords

If you have received a default password as part of the process of obtaining your account, you should change it before you log onto Carter for the first time. Change your password from the SecurePurdue website. You will have the same password on all ITaP systems such as Carter, Purdue email, or Blackboard.

Passwords may need to be changed periodically in accordance with Purdue security policies. Passwords must follow certain guidelines as described on the SecurePurdue webpage and ITaP recommends following some guidelines to select a strong password.

ITaP staff will NEVER ask for your password, by email or otherwise.

Never share your password with another user or make your password known to anyone else.

path breadcrumb divider Accounts path breadcrumb divider Logging In path breadcrumb divider SSH Client Software

Secure Shell or SSH is a way of establishing a secure connection between two computers. It uses public-key cryptography to authenticate the user with the remote computer and to establish a secure connection. Its usual function involves logging in to a remote machine and executing commands. There are many SSH clients available for all operating systems:

Linux / Solaris / AIX / HP-UX / Unix:

  • The ssh command is pre-installed. Log in using ssh myusername@carter.rcac.purdue.edu from a terminal.

Microsoft Windows:

  • MobaXterm is a small, easy to use, full-featured SSH client. It includes X11 support for remote displays, SFTP capabilities, and limited SSH authentication forwarding for keys.
  • PuTTY is an extremely small download of a free, full-featured SSH client.
  • Pageant is an extremely small program used for SSH authentication.

Mac OS X:

  • The ssh command is pre-installed. You may start a local terminal window from "Applications->Utilities". Log in by typing the command ssh myusername@carter.rcac.purdue.edu.

path breadcrumb divider Accounts path breadcrumb divider Logging In path breadcrumb divider SSH Keys

SSH Keys

SSH works with many different means of authentication such as passwords or SSH keys.

To use SSH keys you will need to generate a keypair in the location from where you wish to log in. This keypair consists of two files: private key and public key. You keep the private key file secure on your local computer (hence the name "private" key). You then log in to the remove machine using your password and append the public key to the end of an authorized keys file. In future login attempts, SSH compares the public and private keys to verify your identity instead of prompting for your password.

See the following links for more information on creating SSH keys:

Passphrases and SSH Keys

Creating a keypair prompts you to provide a passphrase for the private key. When you create a keypair, you should always provide a corresponding private key passphrase. This passphrase is not recoverable if forgotten, so make note of it. Only a few situations warrant using a non-passphrase-protected private key—conducting automated file backups is one such situation.

path breadcrumb divider Accounts path breadcrumb divider Logging In path breadcrumb divider ThinLinc

ThinLinc

ITaP Research Computing provides ThinLinc as an alternative to running an X11 server directly on your computer. It allows you to run graphical applications or graphical interacitve jobs directly on Carter through a persisent remote graphical desktop session.

ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. This service works very well over a high latency, low bandwidth, or off-campus connection compared to running an X11 server locally. It is also very helpful for Windows users who do not have an easy to use local X11 server, as little to no set up is required on your computer.

There are two ways in which to use ThinLinc: preferably through the native client or through a web browser.

Installing the ThinLinc native client

The native ThinLinc client will offer the best experience especially over off-campus connections and is the recommended method for using ThinLinc. It is compatible with Windows, Mac OS X, and Linux.

  • Download the ThinLinc client from the ThinLinc website.
  • Start the ThinLinc client on your computer.
  • In the client's login window, use thinlinc.rcac.purdue.edu as the Server. Use your Purdue Career Account username and password.
  • Click the Connect button.
  • Continue to following section on connecting to Carter from ThinLinc.

Using ThinLinc through your web browser

The ThinLinc service can be accessed from your web browser as a convenience to installing the native client. This option works with no set up and is a good option for those on computers where you do not have privileges to install software. All that is required is an up-to-date web browser. Older versions of Internet Explorer may not work.

  • Open a web browser and navigate to thinlinc.rcac.purdue.edu.
  • Log in with your Purdue Career Account username and password.
  • You may safely proceed past any warning messages from your browser.
  • Continue to the following section on connecting to Carter from ThinLinc.

Connecting to Carter from ThinLinc

  • Once logged in, you will be presented with a remote Linux desktop.
  • Open the terminal application on the remote desktop.
  • Log in to the submission host carter.rcac.purdue.edu with X forwarding enabled using the following command:
    $ ssh -Y carter.rcac.purdue.edu 
  • Once logged in to the Carter head node, you may use graphical editors, debuggers, software like Matlab, or run graphical interactive jobs. For example, to test the X forwarding connection issue the following command to launch the graphical editor gedit:
    $ gedit
  • This session will remain persistent even if you disconnect from the session. Any interactive jobs or applications you left running will continue running even if you are not connected to the session.

Tips for using ThinLinc native client

  • To exit a full screen ThinLinc session press the F8 key on your keyboard (fn + F8 key for Mac users) and click to disconnect or exit full screen.
  • Full screen mode can be disabled when connecting to a session by clicking the Options button and disabling full screen mode from the Screen tab.

path breadcrumb divider Accounts path breadcrumb divider Logging In path breadcrumb divider SSH X11 Forwarding

SSH X11 Forwarding

SSH supports tunneling of X11 (X-Windows). If you have an X11 server running on your local machine, you may use X11 applications on remote systems and have their graphical displays appear on your local machine. These X11 connections are tunneled and encrypted automatically by your SSH client.

Installing an X11 Server

To use X11, you will need to have a local X11 server running on your personal machine. Both free and commercial X11 servers are available for various operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • An X11 server is at the core of all graphical sessions. If you are logged in to a graphical environment on these operating systems, you are already running an X11 server.
  • ThinLinc is an alternative to running an X11 server directly on your Linux computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session.

Microsoft Windows:

  • ThinLinc is an alternative to running an X11 server directly on your Windows computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session.
  • MobaXterm is a small, easy to use, full-featured SSH client. It includes X11 support for remote displays, SFTP capabilities, and limited SSH authentication forwarding for keys.
  • Xming is a free X11 server available for all versions of Windows, although it may occasionally hang and require a restart. Download the "Public Domain Xming" or donate to the development for the newest version.
  • Cygwin is another free X11 server available for all versions of Windows. Download and run setup.exe. During installation, you must select the following packages which are not included by default: X-startup-scripts XFree86-lib-compat * xorg- xterm xwinwm lib-glitz-glx1 opengl (if you also want OpenGL support, under the Graphics group)
  • Once you are running the Cygwin X server, start an xterm, type XWin -multiwindow in it, and then press enter. You may now run your SSH client.

Mac OS X:

  • X11 is available as an optional install on the Mac OS X install disks prior to 10.7/Lion. Run the installer, select the X11 option, and follow the instructions. For 10.7+ please download XQuartz.
  • ThinLinc is an alternative to running an X11 server directly on your Mac computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session.

Enabling X11 Forwarding in your SSH Client

Once you are running an X11 server, you will need to enable X11 forwarding/tunneling in your SSH client:

  • "ssh": X11 tunneling should be enabled by default. To be certain it is enabled, you may use ssh -Y.
  • PuTTY: Prior to connection, in your connection's options, under "X11", check "Enable X11 forwarding", and save your connection.
  • MobaXterm: Select "New session" and "SSH." Under "Advanced SSH Settings" check the box for X11 Forwarding.

SSH will set the remote environment variable $DISPLAY to "localhost:XX.YY" when this is working correctly. If you had previously set your $DISPLAY environment variable to your local IP or hostname, you must remove any set/export/setenv of this variable from your login scripts. The environment variable $DISPLAY must be left as SSH sets it, which is to a random local port address. Setting $DISPLAY to an IP or hostname will not work.

path breadcrumb divider Accounts path breadcrumb divider Purchasing Nodes

Purchasing Nodes - Community Cluster Program

Information Technology at Purdue (ITaP) operates a significant shared cluster computing infrastructure developed over several years through focused acquisitions using funds from grants, faculty startup packages, and institutional sources. These "community clusters" are now at the foundation of Purdue's research cyberinfrastructure.

We strongly encourage any Purdue faculty or staff with computational needs to join this growing community and enjoy the enormous benefits this shared infrastructure provides:

  • Peace of Mind
  • ITaP system administrators take care of security patches, attempted hacks, operating system upgrades, and hardware repair so faculty and graduate students can concentrate on research.

  • Low Overhead
  • ITaP data centers provide infrastructure such as networking, racks, floor space, cooling, and power.

  • Cost Effective
  • ITaP works with vendors to obtain the best price for computing resources by pooling funds from different disciplines to leverage greater group purchasing power.

Through the Community Cluster Program, Purdue affiliates have invested several million dollars in computational and storage resources from Q4 2006 to the present with great success in both the research accomplished and the money saved on equipment purchases.

For more information or to purchase access to our latest cluster today, see the Access Purchase page. To get updates on ITaP's community cluster program, please subscribe to the Community Cluster Program Mailing List

path breadcrumb divider File Storage and Transfer

File Storage and Transfer for Carter

    Archive and Compression
    Environment Variables
    Storage Options
        Home Directory
        Long-Term Storage
        Scratch Space
        /tmp Directory

    Storage Quota / Limits
    Sharing Data
    File Transfer
        SCP
        Globus
        Windows Network Drive / SMB
        FTP / SFTP


path breadcrumb divider File Storage and Transfer path breadcrumb divider Archive and Compression

Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. The mostly commonly used options are:

tar

  (more information)

Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.

Examples:

  (list contents of archive somefile.tar)
$ tar tvf somefile.tar

  (extract contents of somefile.tar)
$ tar xvf somefile.tar

  (extract contents of gzipped archive somefile.tar.gz)
$ tar xzvf somefile.tar.gz

  (extract contents of bzip2 archive somefile.tar.bz2)
$ tar xjvf somefile.tar.bz2

  (archive all ".c" files in current directory into one archive file)
$ tar cvf somefile.tar *.c

  (archive and gzip-compress all files in a directory into one archive file)
$ tar czvf somefile.tar.gz somedirectory/

  (archive and bzip2-compress all files in a directory into one archive file)
$ tar cjvf somefile.tar.bz2 somedirectory/

Other arguments for tar can be explored by using the man tar command.

gzip

  (more information)

The standard compression system for all GNU software.

Examples:

  (compress file somefile - also removes uncompressed file)
$ gzip somefile

  (uncompress file somefile.gz - also removes compressed file)
$ gunzip somefile.gz

bzip2

  (more information)

Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.

Examples:

  (compress file somefile - also removes uncompressed file)
$ bzip2 somefile

  (uncompress file somefile.bz2 - also removes compressed file)
$ bunzip2 somefile.bz2

There are several other, less commonly used, options available as well:

  • zip
  • 7zip
  • xz

path breadcrumb divider File Storage and Transfer path breadcrumb divider Environment Variables

Environment Variables

Several environment variables are automatically defined for you to help you manage your storage. Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:
Name Description
HOME path to your home directory
PWD path to your current directory
RCAC_SCRATCH path to scratch filesystem

By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

$ ls $HOME
...

$ ls $RCAC_SCRATCH/myproject
...

To find the value of any environment variable:

$ echo $RCAC_SCRATCH
/scratch/rice/m/myusername

To list the values of all environment variables:

$ env
USER=myusername
HOME=/home/myusername
RCAC_SCRATCH=/scratch/rice/m/myusername
...

You may create or overwrite an environment variable. To pass (export) the value of a variable in bash:

$ export MYPROJECT=$RCAC_SCRATCH/myproject

To assign a value to an environment variable in either tcsh or csh:

$ setenv MYPROJECT value

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Options

Storage Options

File storage options on ITaP research systems include long-term storage (home directories, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. ITaP provides daily snapshots of home directories for a limited time for accidental deletion recovery. ITaP does not back up scratch directories or temporary storage and regularly purges old files from scratch and /tmp directories. More details about each storage option appear below.

    Home Directory
    Long-Term Storage
    Scratch Space
    /tmp Directory

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Options path breadcrumb divider Home Directory

Home Directory

ITaP provides home directories for long-term file storage. Each user has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

ITaP provides daily snapshots of your home directory for a limited period of time in the event of accidental deletion. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive.

Your home directory physically resides on a GPFS storage system in the Research Computing data center. To find the path to your home directory, first log in then immediately enter the following:

$ pwd
/home/myusername

Or from any subdirectory:

$ echo $HOME
/home/myusername

Your home directory and its contents are available on all ITaP research computing machines, including front-end hosts and compute nodes.

Your home directory has a quota limiting the total size of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

Lost File Recovery

Only files which have been snap-shotted overnight are recoverable. If you lose a file the same day you created it, it is NOT recoverable.

To recover files lost from your home directory, use the flost command:

$ flost

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Options path breadcrumb divider Long-Term Storage

Long-Term Storage

Long-term Storage or Permanent Storage is available to ITaP research users on the High Performance Storage System (HPSS), an archival storage system, called Fortress. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has over 10PB of capacity.

For more information about Fortress, how it works, and user guides, and how to obtain an account:

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Options path breadcrumb divider Scratch Space

Scratch Space

ITaP provides scratch directories for short-term file storage only. The quota of your scratch directory is much greater than the quota of your home directory. You should use your scratch directory for storing temporary input files which your job reads or for writing temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results. The hsi and htar commands provide easy-to-use interfaces into the archive and can be used to copy files into the archive interactively or even automatically at the end of your regular job submission scripts.

Files in scratch directories are not recoverable. ITaP does not back up files in scratch directories. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

ITaP purges files from scratch directories not accessed or had content modified in 90 days. Owners of these files receive a notice one week before removal via email. Be sure to regularly check your Purdue email account or set up mail forwarding to an email account you do regularly check. For more information, please refer to our Scratch File Purging Policy.

All users may access scratch directories on Carter. To find the path to your scratch directory:

$ findscratch
/scratch/rice/m/myusername

The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

$ echo $RCAC_SCRATCH
/scratch/rice/m/myusername

All scratch directories are available on each front-end of all computational resources, however, only the /scratch/rice directory is available on Carter compute nodes. No other scratch directories are available on Carter compute nodes.

Your scratch directory has a quota capping the total size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits .

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Options path breadcrumb divider /tmp Directory

/tmp Directory

ITaP provides /tmp directories for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

ITaP does not perform backups for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

path breadcrumb divider File Storage and Transfer path breadcrumb divider Storage Quota / Limits

Storage Quota / Limits

ITaP imposes some limits on your disk usage on research systems. ITaP implements a quota on each filesystem. Each filesystem (home directory, scratch directory, etc.) may have a different limit. If you exceed the quota, you will not be able to save new files or new data to the filesystem until you delete or move data to long-term storage.

Checking Quota

To check the current quotas of your home and scratch directories check the My Quota page or use the myquota command:

$ myquota
Type        Filesystem          Size    Limit  Use         Files    Limit  Use
==============================================================================
home        extensible         5.0GB   10.0GB  50%             -        -   -
scratch     /scratch/rice/    8KB  476.8GB   0%             2  100,000   0%

The columns are as follows:

  • Type: indicates home or scratch directory.
  • Filesystem: name of storage option.
  • Size: sum of file sizes in bytes.
  • Limit: allowed maximum on sum of file sizes in bytes.
  • Use: percentage of file-size limit currently in use.
  • Files: number of files and directories (not the size).
  • Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
  • Use: percentage of file-number limit currently in use.

If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

$ du -h --max-depth=1 $HOME >myfile
32K /home/myusername/mysubdirectory_1
529M    /home/myusername/mysubdirectory_2
608K    /home/myusername/mysubdirectory_3

The second directory is the largest of the three, so apply command du to it.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

$ du -h --max-depth=1 $RCAC_SCRATCH >myfile
160K    /scratch/rice/m/myusername

This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to long-term storage to free space in your home and scratch directories.

Increasing Quota

Home Directory

If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. Unfortunately, it is not possible to increase your home directory quota beyond it's current level.

Scratch Space

If you find you need additional disk space in your scratch space, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may ask for a quota increase at rcac-help@purdue.edu.

path breadcrumb divider File Storage and Transfer path breadcrumb divider Sharing Data

Sharing Data

Data on any Research Computing resource can be shared with other users within Purdue or with collaborators at other institutions. Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

path breadcrumb divider File Storage and Transfer path breadcrumb divider File Transfer

File Transfer

Carter supports several methods for file transfer. Use the links below to learn more about these methods.

    SCP
    Globus
    Windows Network Drive / SMB
    FTP / SFTP

path breadcrumb divider File Storage and Transfer path breadcrumb divider File Transfer path breadcrumb divider SCP

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage:

  (to a remote system from local)
$ scp sourcefilename myusername@carter.rcac.purdue.edu:somedirectory/destinationfilename

  (from a remote system to local)
$ scp myusername@carter.rcac.purdue.edu:somedirectory/sourcefilename destinationfilename

  (recursive directory copy to a remote system from local)
$ scp -r sourcedirectory/ myusername@carter.rcac.purdue.edu:somedirectory/

Linux / Solaris / AIX / HP-UX / Unix:

  • The "scp" command-line program should already be installed.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SCP and SFTP client.
  • Cyberduck is another full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers "pscp.exe", which is an extremely small program and a basic command-line SCP client.
  • Secure FX is a commercial SCP and SFTP graphical client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

path breadcrumb divider File Storage and Transfer path breadcrumb divider File Transfer path breadcrumb divider Globus

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It works within ITaP's various research storage systems; it connects between ITaP and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and Fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Globus Web:

  • Navigate to http://transfer.rcac.purdue.edu
  • Click "Proceed" to log in with your Purdue Career Account.
  • On your first login it will ask to make a connection to a Globus account. If you already have one - sign in to associate with your Career Account. Otherwise, click the link to create a new account.
  • Now you're at the main screen. Click "File Transfer" which will bring you to a two-endpoint interface.
  • You will need to select one endpoint on one side as the source, and a second endpoint on the other as the destination. This can be one of several Purdue endpoints or another University or your personal computer (see Personal Client section below).

The ITaP Research Computing endpoints are as follows. A search for "Purdue" will give you several suggested results you can choose from, or you can give a more specific search.

  • Home Directory storage: "Purdue Research Computing - Home Directories", however, you can start typing "Purdue" or "Home Directories" and it will suggest appropriate matches.
  • Carter scratch storage: "Purdue Carter Cluster", however, you can start typing "Purdue" or "Carter and it will suggest appropriate matches. From here you will need to navigate into the first letter of your username, and then into your username.
  • Research Data Depot: "Purdue Research Computing - Data Depot", a search for "Depot" should provide appropriate matches to choose from.
  • Fortress: "Purdue Fortress HPPS Archive", a search for "Fortress" should provide appropriate matches to choose from.

From here, select a file or folder in either side of the two-pane window, and then use the arrows in the top-middle of the interface to instruct Globus to move files from one side to the other. You can transfer files in either direction. You will receive an email once the transfer is completed.

Globus Personal Client setup:

  • On the endpoint page from earlier, click "Get Globus Connect Personal" or download it from here: Globus Connect Personal
  • Name this particular personal system and click "Generate Setup Key" on this page: Create Gloubs Personal endpoint
  • Copy the key and paste it into the setup box when installing the client for your system.
  • Your personal system is now available as an endpoint within the Globus transfer interface.

Globus Command Line:

Sharing Data with Outside Collaborators

Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

For more information, please see Globus Support.

path breadcrumb divider File Storage and Transfer path breadcrumb divider File Transfer path breadcrumb divider Windows Network Drive / SMB

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and Fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Note: to access Carter through SMB file sharing, you must be on a Purdue campus network or connected through VPN.

Windows:

  • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
  • Windows 8.1: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
  • In the folder location enter the following information and click Finish:

    • To access your home directory, enter \\samba.rcac.purdue.edu\myusername.
    • To access your scratch space on Carter, enter \\samba.rcac.purdue.edu\scratch. Once mapped, you will be able to navigate to carter\m\myusername. You may also navigate to any of the other cluster scratch directories from this drive mapping.
    • To access your Fortress long-term storage home directory, enter \\fortress-smb.rcac.purdue.edu\myusername.
    • To access a shared Fortress group storage directory, enter \\fortress-smb.rcac.purdue.edu\group\mygroupname where mygroupname is the name of the shared group space.

  • Your home, scratch, or Fortress directory should now be mounted as a drive in the Computer window.

Mac OS X:

  • In the Finder, click Go > Connect to Server
  • In the Server Address enter the following information and click Connect:

    • To access your home directory, enter smb://samba.rcac.purdue.edu/myusername.
    • To access your scratch space on Carter, enter smb://samba.rcac.purdue.edu/scratch. Once connected, you will be able to navigate to carter/m/myusername. You may also navigate to any of the other cluster scratch directories from this mount.
    • To access your Fortress long-term storage home directory, enter smb://fortress-smb.rcac.purdue.edu/myusername.
    • To access a shared Fortress group storage directory, enter smb://fortress-smb.rcac.purdue.edu/group/mygroupname where mygroupname is the name of the shared group space.

  • You may be prompted for login information. Enter your username, password and for the domain enter onepurdue or it will prevent you from logging in.

Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you would like access via samba on the command line you may install smbclient which will give you ftp-like access and can be used as shown below. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //samba.rcac.purdue.edu/myusername -U myusername

path breadcrumb divider File Storage and Transfer path breadcrumb divider File Transfer path breadcrumb divider FTP / SFTP

FTP / SFTP

ITaP does not support FTP on any ITaP research systems because it does not allow for secure transmission of data. Use SFTP instead, as described below.

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

Command-line usage:

$ sftp -B buffersize myusername@carter.rcac.purdue.edu

      (to a remote system from local)
sftp> put sourcefile somedir/destinationfile
sftp> put -P sourcefile somedir/

      (from a remote system to local)
sftp> get sourcefile somedir/destinationfile
sftp> get -P sourcefile somedir/

sftp> exit
  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Linux / Solaris / AIX / HP-UX / Unix:

  • The "sftp" command-line program should already be installed.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SFTP and SCP client.
  • Cyberduck is another full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers "psftp.exe", which is an extremely small program and a basic command-line SFTP client.
  • Secure FX is a commercial SFTP and SCP graphical client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

path breadcrumb divider Applications path breadcrumb divider Environment Management with the Module Command

Provided Applications

The Carter cluster provides a number of software packages to users of the system via the module command.

As of February 1st, 2017 all clusters will be using hierarchical modules. This guide has been updated to reflect this new configuration, though most of the information in this guide still applies to the current configuration.

Environment Management with the Module Command

ITaP uses the module command as the preferred method to manage your processing environment. With this command, you may load applications and compilers along with their libraries and paths. Modules are packages which you load and unload as needed.

Please use the module command and do not manually configure your environment, as ITaP staff may make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.

Hierarchy

Many modules have dependencies on other modules. For example, a particular openmpi module requires a specific version of the Intel compiler to be loaded. Often, these dependencies are not clear for users of the module, and there are many modules which may conflict. Arranging modules in an hierarchical fashion makes this dependency clear. This arrangement also helps make the software stack easy to understand - your view of the modules will not be cluttered with a bunch of conflicting packages.

Your default module view on Carter will include a set of compilers and the set of basic software that has no dependencies (such as Matlab and Fluent). To make software available that depends on a compiler, you must first load the compiler, and then software which depends on it becomes available to you. In this way, all software you see when doing "module avail" is completely compatible with each other.

Using the Hierarchy

Your default module view on Carter will include a set of compilers, and the set of basic software that has no dependencies (such as Matlab and Fluent).

To see what modules are available on this system by default:

$ module avail

To see which versions of a specific compiler are available on this system:

$ module avail gcc
$ module avail intel

To continue further into the hierarchy of modules, you will need to choose a compiler. As an example, if you are planning on using the Intel compiler you will first want to load the Intel compiler:

$ module load intel

With intel loaded, you can repeat the avail command and at the bottom of the output you will see the a section of additional software that the intel module provides:

$ module avail

Several of these new packages also provide additional software packages, such as MPI libraries. You can repeat the last two steps with one of the MPI packages such as openmpi and you will have a few more software packages available to you.

If you are looking for a specific software package and do not see it in your default view, the module command provies a search function for searching the entire hierarchy tree of modules without need for you to manually load and avail on every module.

To search for a software package:

$ module spider openmpi
----------------------------------------------------------------------------
  openmpi:
----------------------------------------------------------------------------
     Versions:
        openmpi/1.8.1
        openmpi/1.10.1

This will search for the openmpi software package. If you do not specify a specific version of the package, you will be given a list of versions available on the system. Select the version you wish to use and spider that to see how to access the module:

$ module spider openmpi/1.8.1
...
  You will need to load one of the set of module(s) below before the "openmpi/1.8.1" module is available to load.

      gcc/4.7.2
      gcc/5.2.0
      intel/13.1.1.163
      intel/14.0.2.144
      intel/15.0.3.187
      intel/16.0.1.150
...

The output of this command will instruct you that you can load the this module directly, or in case of the above example, that you will need to first load a module or two. With the information provide with this command, you can now construct a load command to load a version of OpenMPI into your environment:

$ module load intel/16.0.1.150 openmpi/1.8.1

Some user communities may maintain copies of their domain software for others to use. For example, the Purdue Bioinformatics Core provides a wide set of bioinformatcs software for use by any user of ITaP clusters via the bioinfo module. The spider command will also search this repository of modules. If it finds a software package available in the bioinfo module repository, the spider command will instruct you to load the bioinfo module first.

Load / Unload a Module

All modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or you may specify which version you wish to load.

For each cluster, ITaP makes a recommendation regarding the set of compiler, math library, and MPI library for parallel code. To load the recommended set:

$ module load rcac

To verify what you loaded:

$ module list

To load the default version of a specific compiler, choose one of the following commands:

$ module load gcc
$ module load intel

To load a specific version of a compiler, include the version number:

$ module load intel/13.1.1.163

When running a job, you must use the job submission file to load on the compute node(s) any relevant modules. Loading modules on the front end before submitting your job makes the software available to your session on the front-end, but not to your job submission script environment. You must load the necessary modules in your job submission script.

To unload a compiler or software package you loaded previously:

$ module unload gcc
$ module unload intel
$ module unload matlab

To unload all currently loaded modules and reset your environment:

module purge

Show Module Details

To learn more about what a module does to your environment, you may use the module show command. Here is an example showing what loading the default Matlab does to the processing environment:

$ module show matlab
----------------------------------------------------------------------------
 /opt/modules/modulefiles/matlab/R2013a:
----------------------------------------------------------------------------
whatis      invoke MATLAB Release R2013a
setenv      MATLAB "/apps/rhel6/MATLAB/R2013a"
setenv      MLROOT "/apps/rhel6/MATLAB/R2013a"
setenv      ARCH "glnxa64"
prepend_path    PATH "/apps/rhel6/MATLAB/R2013a/bin/glnxa64"
prepend_path    PATH "/apps/rhel6/MATLAB/R2013a/bin"
prepend_path    LD_LIBRARY_PATH "/apps/rhel6/MATLAB/R2013a/runtime/glnxa64"
prepend_path    LD_LIBRARY_PATH "/apps/rhel6/MATLAB/R2013a/bin/glnxa64"
help([[ matlab - Technical Computing Environment
]])

path breadcrumb divider Compiling Source Code path breadcrumb divider Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one processor core. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

$ module load intel
$ module load gcc
The following table illustrates how to compile your serial program:
Language Intel Compiler GNU Compiler
Fortran 77
$ ifort myprogram.f -o myprogram
$ gfortran myprogram.f -o myprogram
Fortran 90
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f90 -o myprogram
Fortran 95
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f95 -o myprogram
C
$ icc myprogram.c -o myprogram
$ gcc myprogram.c -o myprogram
$ pgcc myprogram.c -o myprogram
C++
$ icc myprogram.cpp -o myprogram
$ g++ myprogram.cpp -o myprogram
$ pgCC myprogram.cpp -o myprogram

The Intel and GNU compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

path breadcrumb divider Compiling Source Code path breadcrumb divider Compiling MPI Programs

Compiling MPI Programs

OpenMPI and Intel MPI (IMPI) are implementations of the Message-Passing Interface (MPI) standard. Libraries for these MPI implementations and compilers for C, C++, and Fortran are available on all clusters. A full list of MPI library versions installed on Carter is available in the software catalog.

MPI programs require including a header file:
Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
INCLUDE 'mpif.h'
Fortran 95
INCLUDE 'mpif.h'
C
#include <mpi.h>
C++
#include <mpi.h>

Here are a few sample programs using MPI:

To see the available MPI libraries:

$ module avail openmpi 
$ module avail impi
The following table illustrates how to compile your MPI program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language Intel MPI OpenMPI or Intel MPI (IMPI)
Fortran 77
$ mpiifort program.f -o program
$ mpif77 program.f -o program
Fortran 90
$ mpiifort program.f90 -o program
$ mpif90 program.f90 -o program
Fortran 95
$ mpiifort program.f95 -o program
$ mpif90 program.f95 -o program
C
$ mpiicc program.c -o program
$ mpicc program.c -o program
C++
$ mpiicpc program.C -o program
$ mpiCC program.C -o program

The Intel and GNU compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on the MPI libraries:

path breadcrumb divider Compiling Source Code path breadcrumb divider Compiling OpenMP Programs

Compiling OpenMP Programs

All compilers installed on Carter include OpenMP functionality for C, C++, and Fortran. An OpenMP program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over processor cores in a single compute node without the need for MPI communications.

OpenMP programs require including a header file:
Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
Fortran 90
use omp_lib
Fortran 95
use omp_lib
C
#include <omp.h>
C++
#include <omp.h>

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.
Language Intel Compiler GNU Compiler
Fortran 77
$ ifort -openmp myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
Fortran 90
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
Fortran 95
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
C
$ icc -openmp myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
C++
$ icc -openmp myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram

The Intel and GNU compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on OpenMP:

path breadcrumb divider Compiling Source Code path breadcrumb divider Compiling Hybrid Programs

Compiling Hybrid Programs

A hybrid program combines both MPI and shared-memory to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI and Intel MPI (IMPI) and compilers which include OpenMP for C, C++, and Fortran are available.

Hybrid programs require including header files:
Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
INCLUDE 'mpif.h'
Fortran 90
use omp_lib
INCLUDE 'mpif.h'
Fortran 95
use omp_lib
INCLUDE 'mpif.h'
C
#include <mpi.h>
#include <omp.h>
C++
#include <mpi.h>
#include <omp.h>

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

To see the available MPI libraries:

$ module avail impi    
$ module avail openmpi    
The following table illustrates how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.
Language Intel MPI OpenMPI or Intel MPI (IMPI) with Intel Compiler
Fortran 77
$ mpiifort -openmp myprogram.f -o myprogram
$ mpif77 -openmp myprogram.f -o myprogram
Fortran 90
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
Fortran 95
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
C
$ mpiicc -openmp myprogram.c -o myprogram
$ mpicc -openmp myprogram.c -o myprogram
C++
$ mpiicpc -openmp myprogram.C -o myprogram
$ mpiCC -openmp myprogram.C -o myprogram
Language OpenMPI or Intel MPI (IMPI) with GNU Compiler
Fortran 77
$ mpif77 -fopenmp myprogram.f -o myprogram
Fortran 90
$ mpif90 -fopenmp myprogram.f90 -o myprogram
Fortran 95
$ mpif90 -fopenmp myprogram.f95 -o myprogram
C
$ mpicc -fopenmp myprogram.c -o myprogram
C++
$ mpiCC -fopenmp myprogram.C -o myprogram

The Intel and GNU compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

path breadcrumb divider Compiling Source Code path breadcrumb divider Compiling GPU Programs

Compiling GPU Programs

The Carter cluster nodes contain ${resource.nodegpus} GPUs that support CUDA and OpenCL. See the detailed hardware overview for the specifics on the GPUs in Carter. This section focuses on using CUDA.

A simple CUDA program has a basic workflow:

  • Initialize an array on the host (CPU).
  • Copy array from host memory to GPU memory.
  • Apply an operation to array on GPU.
  • Copy array from GPU memory to host memory.

Here is a sample CUDA program:

Both front-ends and GPU-enabled compute nodes have the CUDA tools and libraries available to compile CUDA programs. To compile a CUDA program, load CUDA, and use nvcc to compile the program:

$ module load cuda
$ nvcc gpu_hello.cu -o gpu_hello
./gpu_hello

The example illustrates only how to copy an array between a CPU and its GPU but does not perform a serious computation.

The following program times three square matrix multiplications on a CPU and on the global and shared memory of a GPU:

$ module load cuda
$ nvcc mcc.cu -o mm
$ ./mm
                                                            speedup
                                                            -------
Elapsed time in CPU:                    8435.3 milliseconds
Elapsed time in GPU (global memory):      46.9 milliseconds  180.0
Elapsed time in GPU (shared memory):      29.9 milliseconds  282.3

For best performance, the input array or matrix must be sufficiently large to overcome the overhead in copying the input and output data to and from the GPU.

For more information about NVIDIA, CUDA, and GPUs:

path breadcrumb divider Compiling Source Code path breadcrumb divider Intel MKL Library

Intel MKL Library

Intel Math Kernel Library (MKL) contains ScaLAPACK, LAPACK, Sparse Solver, BLAS, Sparse BLAS, CBLAS, GMP, FFTs, DFTs, VSL, VML, and Interval Arithmetic routines. MKL resides in the directory stored in the environment variable MKL_HOME, after loading a version of the Intel compiler with module.

By using module load to load an Intel compiler your environment will have several variables set up to help link applications with MKL. Here are some example combinations of simplified linking options:

$ module load intel
$ echo $LINK_LAPACK
-L${MKL_HOME}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

$ echo $LINK_LAPACK95
-L${MKL_HOME}/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread

ITaP recommends you use the provided variables to define MKL linking options in your compiling procedures. The Intel compiler modules also provide two other environment variables, LINK_LAPACK_STATIC and LINK_LAPACK95_STATIC that you may use if you need to link MKL statically.

ITaP recommends that you use dynamic linking of libguide. If so, define LD_LIBRARY_PATH such that you are using the correct version of libguide at run time. If you use static linking of libguide, then:

  • If you use the Intel compilers, link in the libguide version that comes with the compiler (use the -openmp option).
  • If you do not use the Intel compilers, link in the libguide version that comes with the Intel MKL above.

Here are some more documentation from other sources on the Intel MKL:

path breadcrumb divider Compiling Source Code path breadcrumb divider Provided Compilers

Provide Compilers on Carter

Compilers are available on Carter for Fortran, C, and C++. Compiler sets from Intel and GNU are installed. A full list of compiler versions installed on Carter is available in the software catalog. More detailed documentation on each compiler set available on Carter follows.

On Carter, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • Intel 16.0.1.150
  • MKL
  • Intel MPI

To load the recommended set:

$ module load rcac
$ module list

More information about using these compilers:

    GNU Compilers
    Intel Compilers

path breadcrumb divider Compiling Source Code path breadcrumb divider Provided Compilers path breadcrumb divider GNU Compilers

GNU Compilers

The official name of the GNU compilers is "GNU Compiler Collection" or "GCC". To discover which versions are available:

$ module avail gcc

Choose an appropriate GCC module and load it. For example:

$ module load gcc

An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load a newer version using the command module load gcc.

Here are some examples for the GNU compilers:
Language Serial Program MPI Program OpenMP Program
Fortran77
$ gfortran myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
Fortran90
$ gfortran myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
Fortran95
$ gfortran myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
C
$ gcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
C++
$ g++ myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module.

For more documentation on the GCC compilers:

path breadcrumb divider Compiling Source Code path breadcrumb divider Provided Compilers path breadcrumb divider Intel Compilers

Intel Compilers

One or more versions of the Intel compiler are available on Carter. To discover which ones:

$ module avail intel

Choose an appropriate Intel module and load it. For example:

$ module load intel
Here are some examples for the Intel compilers:
Language Serial Program MPI Program OpenMP Program
Fortran77
$ ifort myprogram.f -o myprogram
$ mpiifort myprogram.f -o myprogram
$ ifort -openmp myprogram.f -o myprogram
Fortran90
$ ifort myprogram.f90 -o myprogram
$ mpiifort myprogram.f90 -o myprogram
$ ifort -openmp myprogram.f90 -o myprogram
Fortran95 (same as Fortran 90) (same as Fortran 90) (same as Fortran 90)
C
$ icc myprogram.c -o myprogram
$ mpiicc myprogram.c -o myprogram
$ icc -openmp myprogram.c -o myprogram
C++
$ icpc myprogram.cpp -o myprogram
$ mpiicpc myprogram.cpp -o myprogram
$ icpc -openmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module.

For more documentation on the Intel compilers:

path breadcrumb divider Running Jobs

Running Jobs

There is one method for submitting jobs to Carter. You may use PBS to submit jobs to a queue on Carter. PBS performs job scheduling. Jobs may be any type of program. You may use either the batch or interactive mode to run your jobs. Use the batch mode for finished programs; use the interactive mode only for debugging.

In this section, you'll find a few pages describing the basics of creating and submitting PBS jobs. As well, a number of example PBS jobs that you may be able to adapt to your own needs.

    Basics of PBS Jobs
        Job Submission Script
        Submitting a Job
        Checking Job Status
        Checking Job Output
        Holding a Job
        Job Dependencies
        Canceling a Job
        Node Access Policies
        Queues

    Example Jobs
        Generic PBS Jobs
            Batch
            Multiple Node
            Specific Types of Nodes
            Interactive Jobs
            Serial Jobs
            MPI
            OpenMP
            Hybrid
            GPU

        Specific Applications
            Gaussian
            Maple
            Mathematica
            Matlab
                Matlab Script (.m File)
                Implicit Parallelism
                Profile Manager
                Parallel Computing Toolbox (parfor)
                Parallel Toolbox (spmd)
                Distributed Computing Server (parallel job)

            Octave
            Perl
            Python
            R
            SAS
            Singularity
            Spark
                Spark

            Tensorflow on Carter
            Windows



path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs

Basics of PBS Jobs

The Portable Batch System (PBS) is a system providing job scheduling and job management on compute clusters. With PBS, a user requests resources and submits a job to a queue. The system will then take jobs from queues, allocate the necessary nodes, and execute them.

Do NOT run large, long, multi-threaded, parallel, or CPU-intensive jobs on a front-end login host. All users share the front-end hosts, and running anything but the smallest test job will negatively impact everyone's ability to use Carter. Always use PBS to submit your work as a job.

Submitting a Job

There main steps to submitting a job are:

Follow the links below for information on these steps, and other basic information about jobs. A number of example PBS jobs are also available.

    Job Submission Script
    Submitting a Job
    Checking Job Status
    Checking Job Output
    Holding a Job
    Job Dependencies
    Canceling a Job
    Node Access Policies
    Queues

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Job Submission Script

Job Submission Script

To submit work to a PBS queue, you must first create a job submission file. This job submission file is essentially a simple shell script. It will set any required environment variables, load any necessary modules, create or modify files and directories, and run any applications that you need:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Loads Matlab and sets the application up
module load matlab

# Change to the directory from which you originally submitted this job.
cd $PBS_O_WORKDIR

# Runs a Matlab script named 'myscript'
matlab -nodisplay -singleCompThread -r myscript

Once your script is prepared, you are ready to submit your job.

Job Script Environment Variables

PBS sets several potentially useful environment variables which you may use within your job submission files. Here is a list of some:
Name Description
PBS_O_WORKDIR Absolute path of the current working directory when you submitted this job
PBS_JOBID Job ID number assigned to this job by the batch system
PBS_JOBNAME Job name supplied by the user
PBS_NODEFILE File containing the list of nodes assigned to this job
PBS_O_HOST Hostname of the system where you submitted this job
PBS_O_QUEUE Name of the original queue to which you submitted this job
PBS_O_SYSTEM Operating system name given by uname -s where you submitted this job
PBS_ENVIRONMENT "PBS_BATCH" if this job is a batch job, or "PBS_INTERACTIVE" if this job is an interactive job

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Submitting a Job

Submitting a Job

Once you have a job submission file, you may submit this script to PBS using the qsub command. PBS will find, or wait for, an available processor core or a set of processor cores and run your job there. At submission time, you may also optionally specify many other attributes or job requirements you have regarding where your jobs will run.

To submit your job to one compute node with no special requirements:

$ qsub myjobsubmissionfile

To submit your job to a specific queue:

$ qsub -q myqueuename myjobsubmissionfile

By default, each job receives 30 minutes of wall time for its execution. The wall time is the total time in real clock time (not CPU cycles) that you believe your job will need to run to completion. If you know that your job will not need more than a certain amount of time to run, it is very much to your advantage to request less than the maximum allowable wall time, as this may allow your job to schedule and run sooner. To request the specific wall time of 1 hour and 30 minutes:

$ qsub -l walltime=01:30:00 myjobsubmissionfile

The nodes resource indicates how many compute nodes you would like reserved for your job.

Each compute node in Carter has 16 processor cores. Detailed explanations regarding the distribution of your job across different compute nodes for parallel programs appear in the sections covering specific parallel programming libraries.

To request 2 compute nodes with 16 processor cores per node

$ qsub -l nodes=2:ppn=16 myjobsubmissionfile

To submit a job using 1 compute node with 4 processor cores:

$ qsub -l nodes=1:ppn=4,naccesspolicy=shared myjobsubmissionfile 

Please note that when naccesspolicy=singleuser is specified, the scheduler ensures that only jobs from the same are allocated on a node. So, if your singleuser jobs do not fill all the cores on a node, you would still occupy 16 cores in your queue.

If more convenient, you may also specify any command line options to qsub from within your job submission file, using a special form of comment:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

#PBS -V
#PBS -q myqueuename
#PBS -l nodes=1:ppn=1,naccesspolicy=shared 
#PBS -l walltime=01:30:00
#PBS -N myjobname

# Print the hostname of the compute node on which this job is running.
/bin/hostname

If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.

After you submit your job with qsub, it can reside in a queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the number of compute nodes requested, the amount of wall time requested, and what other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.

Once your job is submitted, you can monitor the job status, wait for the job to complete, and check the job output.

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Checking Job Status

Checking Job Status

Once a job is submitted there are several commands you can use to monitor the progress of the job.

To see yourjobs, use the qstat -u command and specify your username:

$ qstat -a -u myusername

carter-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
182792.carter-adm  myusername   standby job1        28422   1   4    --  23:00 R 20:19
185841.carter-adm  myusername   standby job2        24445   1   4    --  23:00 R 20:19
185844.carter-adm  myusername   standby job3        12999   1   4    --  23:00 R 20:18
185847.carter-adm  myusername   standby job4        13151   1   4    --  23:00 R 20:18

To retrieve useful information about your queued or running job, use the checkjob command with your job's ID number. The output should look similar to the following:

$ checkjob -v 163000

job 163000 (RM job '163000.carter-adm.rcac.purdue.edu')

AName: test
State: Idle
Creds:  user:myusername  group:mygroup  class:myqueue
WallTime:   00:00:00 of 20:00:00
SubmitTime: Wed Apr 18 09:08:37
  (Time Queued  Total: 1:24:36  Eligible: 00:00:23)

NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 2
Total Requested Nodes: 1

Req[0]  TaskCount: 2  Partition: ALL
TasksPerNode: 2  NodeCount:  1

Notification Events: JobFail

IWD:            /home/myusername/gaussian
UMask:          0000
OutputFile:     carter-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.o163000
ErrorFile:      carter-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.e163000
User Specified Partition List:   carter-adm,SHARED
Partition List: carter-adm
SrcRM:          carter>-adm  DstRM: carter-adm  DstRMJID: 163000.carter-adm.rcac.purdue.edu
Submit Args:    -l nodes=1:ppn=2,walltime=20:00:00 -q myqueue
Flags:          RESTARTABLE
Attr:           checkpoint
StartPriority:  1000
PE:             2.00
NOTE:  job violates constraints for partition carter-adm (job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160))

BLOCK MSG: job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160) (recorded at last scheduling iteration)

There are several useful bits of information in this output.

  • State lets you know if the job is Idle, Running, Completed, or Held.
  • WallTime will show how long the job has run and its maximum time.
  • SubmitTime is when the job was submitted to the cluster.
  • Total Requested Tasks is the total number of cores used for the job.
  • Total Requested Nodes and NodeCount are the number of nodes used for the job.
  • TasksPerNode is the number of cores used per node.
  • IWD is the job's working directory.
  • OutputFile and ErrorFile are the locations of stdout and stderr of the job, respectively.
  • Submit Args will show the arguments given to the qsub command.
  • NOTE/BLOCK MSG will show details on why the job isn't running. The above error says that all the cores are in use on that queue and the job has to wait. Other errors may give insight as to why the job fails to start or is held.

To view the output of a running job, use the qpeek command with your job's ID number. The -f option will continually output to the screen similar to tail -f, while qpeek without options will just output the whole file so far. Here is an example output from an application:

$ qpeek -f 1651025
TIMING: 600  CPU: 97.0045, 0.0926592/step  Wall: 97.0045, 0.0926592/step, 0.11325 hours remaining, 809.902344 MB of memory in use.
ENERGY:     600    359272.8746    280667.4810     81932.7038      5055.7519       -4509043.9946    383233.0971         0.0000         0.0000    947701.9550       -2451180.1312       298.0766  -3398882.0862  -2442581.9707       298.2890           1125.0475        77.0325  10193721.6822         3.5650         3.0569

TIMING: 800  CPU: 118.002, 0.104987/step  Wall: 118.002, 0.104987/step, 0.122485 hours remaining, 809.902344 MB of memory in use.
ENERGY:     800    360504.1138    280804.0922     82052.0878      5017.1543       -4511471.5475    383214.3057         0.0000         0.0000    946597.3980       -2453282.3958       297.7292  -3399879.7938  -2444652.9520       298.0805            978.4130        67.0123  10193578.8030        -0.1088         0.2596

TIMING: 1000  CPU: 144.765, 0.133817/step  Wall: 144.765, 0.133817/step, 0.148686 hours remaining, 809.902344 MB of memory in use.
ENERGY:    1000    361525.2450    280225.2207     81922.0613      5126.4104       -4513315.2802    383460.2355         0.0000         0.0000    947232.8722       -2453823.2352       297.9291  -3401056.1074  -2445219.8163       297.9184            823.8756        43.2552  10193174.7961        -0.7191        -0.2392
...

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Checking Job Output

Checking Job Output

Once a job is submitted, has run to completion, and is no longer in qstat output your job is complete and ready to have its output examined.

PBS catches output written to standard output and standard error - what would be printed to your screen if you ran your program interactively. Unless you specfied otherwise, PBS will put the output in the directory where you submitted the job.

Standard out will appear in a file whose extension begins with the letter "o", for example myjobsubmissionfile.o1234, where "1234" represents the PBS job ID. Errors that occurred during the job run and written to standard error will appear in your directory in a file whose extension begins with the letter "e", for example myjobsubmissionfile.e1234.

If your program writes its own output files, those files will be created as defined by the program. This may be in the directory where the program was run, or may be defined in a configuration or input file. You will need to check the documentation for your program for more details.

Redirecting Job Output

It is possible to redirect job output to somewhere other than the default location with the -e and -o directives:

#! /bin/sh -l
#PBS -o /home/myusername/joboutput/myjob.out
#PBS -e /home/myusername/joboutput/myjob.out

# This job prints "Hello World" to output and exits
echo "Hello World"

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Holding a Job

Holding a Job

Sometimes you may want to submit a job but not have it run just yet. You may be wanting to allow labmates to cut in front of you in the queue - so hold the job until their jobs have started, and then release yours.

To place a hold on a job before it starts running, use the qhold command:

$ qhold myjobid

Once a job has started running it can not be placed on hold.

To release a hold on a job, use the qrls command:

$ qrls myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Job Dependencies

Job Dependencies

Dependencies are an automated way of holding and releasing jobs. Jobs with a dependency are held until the condition is satisfied. Once the condition is satisified jobs only then become eligible to run and must still queue as normal.

Job dependencies may be configured to ensure jobs start in a specified order. Jobs can be configured to run after other job state changes, such as when the job starts or the job ends.

These examples illustrate setting dependencies in several ways. Typically dependencies are set by capturing and using the job ID from the last job submitted.

To run a job after job myjobid has started:

$ qsub -W depend=after:myjobid myjobsubmissionfile

To run a job after job myjobid ends without error:

$ qsub -W depend=afterok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with errors:

$ qsub -W depend=afternotok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with or without errors:

$ qsub -W depend=afterany:myjobid myjobsubmissionfile

To set more complex dependencies on multiple jobs and conditions:

$ qsub -W depend=after:myjobid1:myjobid2:myjobid3,afterok:myjobid4 myjobsubmissionfile

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Canceling a Job

Canceling a Job

To stop a job before it finishes or remove it from a queue, use the qdel command:

$ qdel myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Node Access Policies

Node Access Policies

Node Access Policy determines how the scheduler allocates a job on a node. By default, job scheduling happens based on the default policy specified in the queue configurations, however, users can change it by specifying the naccesspolicy option in the qsub command. The syntax of this option is:

qsub -l naccesspolicy=policy ... other arguments ... 
where policy can be one of the following:
Policy Favorable Use Case Explanation Advantages Disadvantages
shared Lots of small jobs that need little memory Jobs from any user can run on a node Jobs start sooner, efficient use of community cluster Jobs may contend for resources, especially memory
singleuser Lots of small jobs that pack densely on one or more nodes All jobs running on a node must be owned by the same user Jobs start sooner, only contend with own jobs Takes up all 16 cores on a node even if not using them
singlejob Wide jobs or jobs that use large amounts of memory Only one job can run on a node No contention with other jobs Takes up all 16 cores on a node even if not using them

An example to submit a job in shared mode:

qsub -q myqueue -l nodes=1:ppn=1,walltime=00:30:00,naccesspolicy=shared myjobscript.sub

An example to submit a job in singleuser mode:

qsub -q myqueue -l nodes=1:ppn=4,walltime=00:30:00,naccesspolicy=singleuser myjobscript.sub

Please note that, in singleuser and singlejob modes, your queue allocation would be deducted by a multiple of 16 even if you are not using all the cores. For example, if you run 3 jobs with nodes=1:ppn=8, then in singleuser mode, you would be occupying 2 whole nodes (32 cores) from your queue even though the jobs are only utilizing 24 cores. Similarly, in singlejob mode, you would be occupying 3 whole nodes (48 cores) from your queue.

The default node access policy on Carter is shared

path breadcrumb divider Running Jobs path breadcrumb divider Basics of PBS Jobs path breadcrumb divider Queues

Partner Queues

Carter, as a community cluster, has one or more queues dedicated to and named after each partner who has purchased access to the cluster. These queues provide partners and their researchers with priority access to their portion of the cluster. Jobs in these queues are typically limited to 336 hours. The expectation is that any jobs submitted to named partner queues will start within 4 hours, assuming the queue currently has enough capacity for the job (that is, your labmates aren't using all of the cores currently).

Standby Queue

Additionally, community clusters provide a "standby" queue which is available to all cluster users. This "standby" queue allows users to utilize portions of the cluster that would otherwise be idle, but at a lower priority than partner-queue jobs, and with a relatively short time limit, to ensure "standby" jobs will not be able to tie up resources and prevent partner-queue jobs from running quickly. Jobs in standby are limited to 4 hours. There is no expectation of job start time. If the cluster is very busy with partner queue jobs, or you are requesting a very large job, jobs in standby may take hours or days to start.

Debug Queue

The debug queue allows you to quickly start small, short, interactive jobs in order to debug code, test programs, or test configurations. You are limited to one running job at a time in the queue, and you may run up to two compute nodes for 30 minutes. The expectation is that debug jobs should start within a couple of minutes, assuming all of its dedicated nodes are not taken by others.

To see a list of all queues on Carter that you may submit to, use the qlist command:

$ qlist

                          Current Number of Cores
Queue                 Total     Queue   Run     Free         Max Walltime
===============    ====================================     ==============
debug                  64        0       0      64           0:30:00
myqueue                16       32      8      8           336:00:00 
standby             9,584    7,384   4,678      98           4:00:00 

This lists each queue you can submit to, the number of cores allocated to the queue, the total number of cores queued in jobs waiting to run, how many cores are in use, and how many are available to run jobs. The maximum walltime you may request is also listed. This command can be used to get a general idea of how busy a queue is and how long you may have to wait for your job to start.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs

Example Jobs

A number of example jobs are available for you to look over and adapt to your own needs. The first few are generic examples, and latter ones go into specifics for particular software packages.

    Generic PBS Jobs
        Batch
        Multiple Node
        Specific Types of Nodes
        Interactive Jobs
        Serial Jobs
        MPI
        OpenMP
        Hybrid
        GPU

    Specific Applications
        Gaussian
        Maple
        Mathematica
        Matlab
            Matlab Script (.m File)
            Implicit Parallelism
            Profile Manager
            Parallel Computing Toolbox (parfor)
            Parallel Toolbox (spmd)
            Distributed Computing Server (parallel job)

        Octave
        Perl
        Python
        R
        SAS
        Singularity
        Spark
            Spark

        Tensorflow on Carter
        Windows


path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs

The following examples demonstrate the basics of PBS jobs, and are designed to cover common job request scenarios. These example jobs will need to be modified to run your application or code.

    Batch
    Multiple Node
    Specific Types of Nodes
    Interactive Jobs
    Serial Jobs
    MPI
    OpenMP
    Hybrid
    GPU

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Batch

Batch

This simple example submits the job submission file hello.sub to the standby queue on Carter and requests 4 nodes:

$ qsub -q standby -l nodes=4:ppn=16,walltime=00:01:00 hello.sub
99.carter-adm.rcac.purdue.edu

Remember that ppn can not be larger than the number of processor cores on each node.

After your job finishes running, the ls command will show two new files in your directory, the .o and .e files:

$ ls -l
hello
hello.c
hello.out
hello.sub
hello.sub.e99
hello.sub.o99

If everything went well, then the file hello.sub.e99 will be empty, since it contains any error messages your program gave while running. The file hello.sub.o99 contains the output from your program.

Using Environment Variables in a Job

If you would like to see the value of the environment variables from within a PBS job, you can prepare a job submission file with an appropriate filename, here named env.sub:

#!/bin/sh -l
# FILENAME:  env.sub

# Request four nodes, 1 processor core on each.
#PBS -l nodes=4:ppn=16,walltime=00:01:00

# Change to the directory from which you submitted your job.
cd $PBS_O_WORKDIR

# Show details, especially nodes.
# The results of most of the following commands appear in the error file.
echo $PBS_O_HOST
echo $PBS_O_QUEUE
echo $PBS_O_SYSTEM
echo $PBS_O_WORKDIR
echo $PBS_ENVIRONMENT
echo $PBS_JOBID
echo $PBS_JOBNAME

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

Submit this job:

$ qsub env.sub

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Multiple Node

Multiple Node

This section illustrates various requests for one or multiple compute nodes and ways of allocating the processor cores on these compute nodes. Each example submits a job submission file (myjobsubmissionfile.sub) to a batch session. The job submission file contains a single command cat $PBS_NODEFILE to show the names of the compute node(s) allocated. The list of compute node names indicates the geometry chosen for the job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE

All examples use the default queue of the cluster.

One processor core on any compute node

A job shares the other resources, in particular the memory, of the compute node with other jobs. This request is typical of a serial job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=shared myjobsubmissionfile.sub

Compute node allocated:

carter-a139

Two processor cores on any compute nodes

This request is typical of a distributed-memory (MPI) job:

$ qsub -l nodes=2:ppn=16 myjobsubmissionfile.sub

Compute node(s) allocated:

carter-a139
...
carter-a138
...

All processor cores on one compute node

The option ppn can not be larger than the number of cores on each compute node on the machine in question. This request is typical of a shared-memory (OpenMP) job:

$ qsub -l nodes=1:ppn=16 myjobsubmissionfile.sub

Compute node allocated:

carter-a137
...
All processor cores on any two compute nodes The option ppn can not be larger than the number of processor cores on each compute node on the machine in question. This request is typical of a hybrid (distributed-memory and shared-memory) job:
$ qsub -l nodes=2:ppn=16 myjobsubmissionfile.sub
Compute nodes allocated:
carter-a139
...
carter-a138
...
To gain exclusive access to a compute node, specify all processor cores that are physically available on a compute node:
$ qsub -l nodes=1:ppn=16 myjobsubmissionfile.sub
carter-a005
...
This request is typical of a serial job that needs access to all of the memory of a compute node.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Specific Types of Nodes

Specific Types of Nodes

PBS allows running a job on specific compute nodes based on various quantities, like available memory. The option nodes= enables the request.

This example submits a job submission file, here named myjobsubmissionfile.sub. The job submission file contains a single command cat $PBS_NODEFILE to show the allocated compute node(s). The example uses the default queue of the cluster.

Example: a job requires a compute node in an "A" sub-cluster:

$ qsub -l nodes=1:A myjobsubmissionfile.sub

Compute node allocated:

carter-a009

Example: a job requires a compute node with 32 GB of physical memory:

$ qsub -l nodes=1:32G myjobsubmissionfile.sub

Compute node allocated:

carter-a009

Example: a job declares that it would require 32 GB of physical memory for itself (and thus requires a node that has more than that):

$ qsub -l nodes=1,mem=32gb myjobsubmissionfile.sub

Compute node allocated:

carter-b009

Note that the mem=32gb job above does not run on a 32 GB node. Since the operating system requires some memory for itself (possibly about 2 GB), a mem=32gb job will not fit into such a node, and PBS will place the job on a larger-memory node. If the requested mem= value is greater than the free RAM in the largest available node, the job will never start.

The first two examples above (the A and nodemem32gb keywords) refer to node properties, while the third example above (the mem=32gb keyword) declares a job property. By using node properties, you can direct your job to the desired node type ("give me a 32 GB node" or "give me a node in sub-cluster A"). Using job properties allows you to state what your job requires and let the scheduler find any node which meets these requirements (i.e. "give me a node that is capable of fitting my 32 GB job"). The former will go to 32 GB nodes, while the latter may end up on any of the larger-memory nodes, whichever is available.

Refer to Detailed Hardware Specification section for list of available sub-cluster labels and their respective per-node memory sizes.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Interactive Jobs

Interactive Jobs

Interactive jobs are run on compute nodes, while giving you a shell to interact with. They give you the ability to type commands or use a graphical interface as if you were on a front-end.

If you request an interactive job without a wall time option, PBS assigns to your job the default wall time limit for the queue to which you submit (typically 30 minutes). If this is shorter than the time you actually need, your job will terminate before completion.

To submit an interactive job with one hour of wall time, use the -I option to qsub:

$ qsub -I -l walltime=01:00:00
qsub: waiting for job 100.carter-adm.rcac.purdue.edu to start
qsub: job 100.carter-adm.rcac.purdue.edu ready

If you need to use a remote X11 display from within your job (see the ThinLinc section), add the -X option to qsub as well:

$ qsub -I -l nodes=1:ppn=16 -l walltime=01:00:00 -X
qsub: waiting for job 101.carter-adm.rcac.purdue.edu to start
qsub: job 101.carter-adm.rcac.purdue.edu ready

To quit your interactive job:

logout

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Serial Jobs

Serial Jobs

This section illustrates how to use PBS to submit to a batch session one of the serial programs compiled in the section Compiling Serial Programs.

Suppose that you named your executable file serial_hello. Prepare a job submission file with an appropriate filename, here named serial_hello.sub:

#!/bin/sh -l
# FILENAME:  serial_hello.sub

cd $PBS_O_WORKDIR

./serial_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file, or give the full path to the directory containing the executable program.

Submit the serial job to the default queue on carter and request 1 compute node with 1 processor core and 1 minute of wall time:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=00:01:00 ./serial_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
serial_hello
serial_hello.c
serial_hello.sub
serial_hello.sub.emyjobid
serial_hello.sub.omyjobid

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:carter-a139.rcac.purdue.edu   hello, world

If the job failed to run, then view error messages in the file serial_hello.sub.emyjobid.

If a serial job uses a lot of memory and finds the memory of a compute node overcommitted while sharing the compute node with other jobs, specify the number of processor cores physically available on the compute node to gain exclusive use of the compute node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 serial_hello.sub

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:carter-a139.rcac.purdue.edu   hello, world

ParaFly

ParaFly is a helper program, available through module load parafly that can be used to run multiple processes on one node by reading commands from a file. It keeps track of the commands being run and their success or failure, and keeps a specified number of CPU cores on the node busy with the commands in the file.

For instance, assume you have a file called params.txt with the following 500 lines in it:

runcommand param-1
runcommand param-2
runcommand param-3
runcommand param-4
...
runcommand param-500

You can then run ParaFly with this command:

ParaFly  -c params.txt -CPU 16 -failed_cmds rerun.txt

and ParaFly will manage the 500 'runcommand' commands, keeping 16 of them active at all times, and copying the ones that failed into a file called rerun.txt.

This gives you a way to execute many (ParaFly has been used with upwards of 10,000 commands in its command file) single-core commands in a single PBS job running on a single exclusively allocated node, rather than submitting each of them as a separate job.

So, if you have the params.txt file in the above example, you could submit the following PBS submission file:

#!/bin/bash
#PBS -q standby
#PBS -l nodes=1:ppn=16
#PBS -l walltime=2:00:00

cd $PBS_O_WORKDIR

module load parafly
ParaFly -c params.txt -CPU 16 -failed_cmds rerun.txt

This would run all 500 'runcommand' commands with their associated parameters on the same node, 16 at a time.

ParaFly command files are not bash scripts themselves; instead they are a list of one-line commands that are executed individually by bash. This means that each command line can use input or output redirection, or different command line options. For example:

command1 -opt1 val1 < input1 > output1
command2 -opt2 val2 < input2 > output2
command3 -opt3 val3 < input3 > output3
...
command500 -opt500 val500 < input500 > output500

Note that there is no guarantee of order of execution using ParaFly, so you cannot rely on output from one command being available as input for another.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider MPI

MPI

An MPI (message-passing) job is a set of processes that take advantage of distributed-memory systems by communicating with each other. Work occurs across several compute nodes of a distributed-memory system. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. OpenMPI and Intel MPI (IMPI) are implementations of the MPI standard.

This section illustrates how to use PBS to submit to a batch session one of the MPI programs compiled in the section Compiling MPI Programs.

Use module load to set up the paths to access these libraries. Use module avail to see all MPI packages installed on Carter.

Suppose that you named your executable file mpi_hello. Prepare a job submission file with an appropriate filename, here named mpi_hello.sub:

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

cd $PBS_O_WORKDIR

mpiexec -n 32 ./mpi_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission, or give the full path to the directory containing the executable program.

You invoke an MPI program with the mpiexec command. The number of processes is requested with the -n option and is typically equal to the total number of processor cores you request from PBS (more on this below).

Submit the MPI job to the default queue on Carter and request 2 whole compute nodes and 16 MPI ranks on each compute node and 1 minute of wall time.

$ qsub -l nodes=2:ppn=32,walltime=00:01:00 ./mpi_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
mpi_hello
mpi_hello.c
mpi_hello.sub
mpi_hello.sub.emyjobid
mpi_hello.sub.omyjobid

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:carter-a010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
Runhost:carter-a010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
...
Runhost:carter-a011.rcac.purdue.edu   Rank:16 of 32 ranks   hello, world
Runhost:carter-a011.rcac.purdue.edu   Rank:17 of 32 ranks   hello, world
...

If the job failed to run, then view error messages in the file mpi_hello.sub.emyjobid.

If an MPI job uses a lot of memory and 16 MPI ranks per compute node use all of the memory of the compute nodes, request more compute nodes, while keeping the total number of MPI ranks unchanged.

Submit the job with double the number of compute nodes and modify the node list to halve the number of MPI ranks per compute node (the total number of MPI ranks remains unchanged):

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

cd $PBS_O_WORKDIR

# select every 2nd line #
awk 'NR%2 != 0' < $PBS_NODEFILE > nodefile

mpiexec -n 32 -machinefile ./nodefile ./mpi_hello
$ qsub -l nodes=4:ppn=16,walltime=00:01:00 ./mpi_hello.sub

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:carter-a010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
Runhost:carter-a010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
...
Runhost:carter-a011.rcac.purdue.edu   Rank:8 of 32 ranks   hello, world
...
Runhost:carter-a012.rcac.purdue.edu   Rank:16 of 32 ranks   hello, world
...
Runhost:carter-a013.rcac.purdue.edu   Rank:24 of 32 ranks   hello, world
...

Notes

  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Carter is "standby".
  • Invoking an MPI program on Carter with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke an MPI program.
  • In general, the exact order in which MPI ranks output similar write requests to an output file is random.

For an introductory tutorial on how to write your own MPI programs:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider OpenMP

OpenMP

A shared-memory job is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over several processor cores of a multi-core processor. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables.

This section illustrates how to use PBS to submit an OpenMP program compiled in the section Compiling OpenMP Programs.

When running OpenMP programs, all threads must be on the same compute node to take advantage of shared memory. The threads cannot communicate between nodes.

To run an OpenMP program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS 16

In bash:

$ export OMP_NUM_THREADS=16

Suppose that you named your executable file omp_hello. Prepare a job submission file with an appropriate name, here named omp_hello.sub:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=16
./omp_hello 

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file, or give the full path to the directory containing the program.

Submit the OpenMP job to request 1 complete compute node with all 16 processor cores on the compute node and 1 minute of wall time.

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_hello.sub 

View two new files in your directory (.o and .e):

$ ls -l
omp_hello
omp_hello.c
omp_hello.sub
omp_hello.sub.emyjobid
omp_hello.sub.omyjobid

View the results from one of the sample OpenMP programs about task parallelism:

$ cat omp_hello.sub.omyjobid
SERIAL REGION:     Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
   ...

If the job failed to run, then view error messages in the file omp_hello.sub.emyjobid.

If an OpenMP program uses a lot of memory and 16 threads use all of the memory of the compute node, use fewer processor cores (OpenMP threads) on that compute node.

Modify the job submission file omp_hello.sub to use half the number of processor cores:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=8
./omp_hello

Be sure to request the whole node or other jobs may use the extra memory your job requires.

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_hello.sub

View the results from one of the sample OpenMP programs about task parallelism and using half the number of processor cores:

$ cat omp_hello.sub.omyjobid
SERIAL REGION:     Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider Hybrid

Hyrbrid

A hybrid job combines both MPI and OpenMP attributes to take advantage of distributed-memory systems with multi-core processors. Work occurs across several compute nodes of a distributed-memory system and across the processor cores of the multi-core processors.

This section illustrates how to use PBS to submit a hybrid program compiled in the section Compiling Hybrid Programs.

The path to relevant MPI libraries is not set up on any compute node by default. Using module load is the way to access these libraries. Use module avail to see all MPI packages installed on Carter.

To run a hybrid program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS 16

In bash:

$ export OMP_NUM_THREADS=16

Suppose that you named your executable file hybrid_hello. Prepare a job submission file with an appropriate filename, here named hybrid_hello.sub:

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS={resource.nodecores}
mpiexec -n 2 -machinefile nodefile ./hybrid_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file, or give the full path to the directory containing the executable program.

You invoke a hybrid program with the mpiexec command. You may need to specify how to place the threads on the compute node. Several examples on how to specify thread placement with various MPI libraries are at the bottom of this section.

Submit the hybrid job to the default queue on Carter and request 2 whole compute nodes with 1 MPI rank on each compute node (each using all 16 cores as OpenMP threads) and 1 minute of wall time.

$ qsub -l nodes=2:ppn=16,walltime=00:01:00 hybrid_hello.sub
179168.carter-adm.rcac.purdue.edu

View two new files in your directory (.o and .e):

$ ls -l
hybrid_hello
hybrid_hello.c
hybrid_hello.sub
hybrid_hello.sub.emyjobid
hybrid_hello.sub.omyjobid

View the results from one of the sample hybrid programs about task parallelism:

$ cat hybrid_hello.sub.omyjobid
SERIAL REGION:     Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
   ...
PARALLEL REGION:   Runhost:carter-a045.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
PARALLEL REGION:   Runhost:carter-a045.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
   ...

If the job failed to run, then view error messages in the file hybrid_hello.sub.emyjobid.

If a hybrid job uses a lot of memory and 16 OpenMP threads per compute node uses all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores (OpenMP threads) on each compute node.

Prepare a job submission file with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS=8
mpiexec -n 4 -machinefile nodefile ./hybrid_hello

Submit the job with double the number of compute nodes (MPI ranks). Be sure to request the whole node or other jobs may use the extra memory your job requires.

$ qsub -l nodes=4:ppn=16,walltime=00:01:00 hybrid_hello.sub

View the results from one of the sample hybrid programs about task parallelism with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

$ cat hybrid_hello.sub.omyjobid
SERIAL REGION:     Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:carter-a044.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:carter-a045.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:carter-a045.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:carter-a046.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:carter-a046.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...
PARALLEL REGION:   Runhost:carter-a047.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:carter-a047.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
   ...

Thread placement

Compute nodes are made up of two or more processor chips, or sockets. Typically each socket shares a memory controller and communication busses for all of its cores. Consider these cores as having "shortcuts" to each other. Cores within a socket will be able to communicate faster and more efficiently amongst themselves than with another socket or compute node. MPI ranks should consequently be placed so that they can utilize these "shortcuts". When running hybrid codes it is essential to specify this placement as by default some MPI libraries will limit a rank to a single core or may scatter a rank across processor chips.

Below are examples on how to specify this placement with several MPI libraries. Hybrid codes should be run within jobs requesting the entire node by either using ppn=16 or the -n exclusive flag or the job may result in unexpected and poor thread placement.

OpenMPI 1.6.3

mpiexec -cpus-per-rank $OMP_NUM_THREADS --bycore -np 2 -machinefile nodefile ./hybrid_loop

OpenMPI 1.8

mpiexec -map-by socket:pe=$OMP_NUM_THREADS -np 2 -machinefile nodefile ./hybrid_loop

Intel MPI

mpiexec -np 2 -machinefile nodefile ./hybrid_loop

Notes

  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Carter is "standby".
  • Invoking a hybrid program on Carter with ./program is typically wrong, since this will use only one MPI process and defeats the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke a hybrid program.
  • In general, the exact order in which MPI processes of a hybrid program output similar write requests to an output file is random.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Generic PBS Jobs path breadcrumb divider GPU

GPU

The Carter cluster nodes contain ${resource.nodegpus} GPUs that support CUDA and OpenCL. See the detailed hardware overview for the specifics on the GPUs in Carter.

This section illustrates how to use PBS to submit a simple GPU program.

Suppose that you named your executable file gpu_hello from the sample code gpu_hello.cu. Prepare a job submission file with an appropriate name, here named gpu_hello.sub:

#!/bin/sh -l
# FILENAME:  gpu_hello.sub

module load cuda

cd $PBS_O_WORKDIR

host=`hostname -s`
gpus=`cat $PBS_GPUFILE | grep $host | cut -d'-' -f3 | cut -c4 | sort` 
export CUDA_VISIBLE_DEVICES=`echo $gpus | tr ' ' ','`

./gpu_hello

During job run-time, PBS sets a environment variable $PBS_GPUFILE that contains a file listing the GPUs allocated to this job. This file is very similar to the $PBS_NODEFILE environment variable.

Submit the job:

$ qsub -q standby -l nodes=1:ppn=16,walltime=00:01:00 gpu_hello.sub

After job completion, View two new files in your directory (.o and .e):

$ ls -l
gpu_hello
gpu_hello.cu
gpu_hello.sub
gpu_hello.sub.emyjobid
gpu_hello.sub.omyjobid

View results in the file for all standard output, gpu_hello.sub.omyjobid

hello, world

If the job failed to run, then view error messages in the file gpu_hello.sub.emyjobid.

To select which CUDA device to use with a CUDA C program, use the cudaSetDevice( int device ) API call to set the device. All subsequent CUDA memory allocations or kernel launches will be performed on this device. This example takes the device number as the first command line argument:

if (cudaSetDevice(atoi(argv[1])) != cudaSuccess) {
    int num_devices;
    cudaGetDeviceCount(&num_devices);
    fprintf(stderr, "Error initializing device %s, device value must be 0-%d\n", argv[1], (num_devices-1));
    return 0;
}

The value to pass to your program can be simplistically determined from the batch submission script:

host=`hostname -s`
gpus=`cat $PBS_GPUFILE | grep $host | cut -d'-' -f3 | cut -c4 | sort` 
export CUDA_VISIBLE_DEVICES=`echo $gpus | tr ' ' ','`
./gpu_hello

Using multiple GPUs within a program will require more complex processing of the $PBS_GPUFILE file.

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications

The following examples demonstrate job submission files for some common real-world applications. See the Generic PBS Examples section for more examples on job submissions that can be adapted for use.

    Gaussian
    Maple
    Mathematica
    Matlab
        Matlab Script (.m File)
        Implicit Parallelism
        Profile Manager
        Parallel Computing Toolbox (parfor)
        Parallel Toolbox (spmd)
        Distributed Computing Server (parallel job)

    Octave
    Perl
    Python
    R
    SAS
    Singularity
    Spark
        Spark

    Tensorflow on Carter
    Windows

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Gaussian

Gaussian

Gaussian is a computational chemistry software package which works on electronic structure. This section illustrates how to submit a small Gaussian job to a PBS queue. This Gaussian example runs the Fletcher-Powell multivariable optimization.

Prepare a Gaussian input file with an appropriate filename, here named myjob.com. The final blank line is necessary:

#P TEST OPT=FP STO-3G OPTCYC=2

STO-3G FLETCHER-POWELL OPTIMIZATION OF WATER

0 1
O
H 1 R
H 1 R 2 A

R 0.96
A 104.

To submit this job, load Gaussian then run the provided script, named subg09. This job uses one compute node with 16 processor cores:

$ module load gaussian09
$ subg09 myjob -l nodes=1:ppn=16

View job status:

$ qstat -u myusername

View results in the file for Gaussian output, here named myjob.log. Only the first and last few lines appear here:

 
Entering Gaussian System, Link 0=/apps/rhel6/g09-D.01/g09/g09
 Initial command:
 /apps/rhel6/g09-D.01/g09/l1.exe /scratch/rice/m/myusername/gaussian/Gau-7781.inp -scrdir=/scratch/rice/m/myusername/gaussian/
 Entering Link 1 = /apps/rhel6/g09-D.01/g09/l1.exe PID=      7782.

 Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2010,
            Gaussian, Inc.  All Rights Reserved.

.
.
.

 Job cpu time:  0 days  0 hours  1 minutes 37.3 seconds.
 File lengths (MBytes):  RWF=      5 Int=      0 D2E=      0 Chk=      1 Scr=      1
 Normal termination of Gaussian 09 at Wed Mar 30 10:49:02 2011.
real 17.11
user 92.40
sys 4.97
Machine:
carter-a389
carter-a389
carter-a389
carter-a389
carter-a389
carter-a389
carter-a389
carter-a389

Examples of Gaussian PBS Job Submissions

Submit job using 16 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=16,walltime=200:00:00 -q myqueuename

Submit job using 16 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=16,walltime=200:00:00 -q myqueuename

For more information about Gaussian:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Maple

Maple

Maple is a general-purpose computer algebra system. This section illustrates how to submit a small Maple job to a PBS queue. This Maple example differentiates, integrates, and finds the roots of polynomials.

Prepare a Maple input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple
cd $PBS_O_WORKDIR

# Use the -q option to suppress startup messages.
# maple -q myjob.in
maple myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=16 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

                                         2
                                      6 x

                                                           2
                      2                                   x
                  -3 x  cos(x) + 6 cos(x) + 6 x sin(x) + ----
                                                          2

                                    1/3, -1

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Maple:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Mathematica

Mathematica

Mathematica implements numeric and symbolic mathematics. This section illustrates how to submit a small Mathematica job to a PBS queue. This Mathematica example finds the three roots of a third-degree polynomial.

Prepare a Mathematica input file with an appropriate filename, here named myjob.in:

(* FILENAME:  myjob.in *)

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
cd $PBS_O_WORKDIR

math < myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=16 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Mathematica 5.2 for Linux x86 (64 bit)
Copyright 1988-2005 Wolfram Research, Inc.
 -- Terminal graphics initialized --

In[1]:=
In[2]:=
In[2]:=
In[3]:=
                     2    3
Out[3]= 1 + 3 x + 3 x  + x

In[4]:=
Out[4]= {{x -> -1}, {x -> -1}, {x -> -1}}

In[5]:=

View the standard error file, myjob.sub.emyjobid:

rmdir: ./ligo/rengel/tasks: Directory not empty
rmdir: ./ligo/rengel: Directory not empty
rmdir: ./ligo: Directory not empty

For more information about Mathematica:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab

MATLAB® (MATrix LABoratory) is a high-level language and interactive environment for numerical computation, visualization, and programming. MATLAB is a product of MathWorks.

MATLAB, Simulink, Compiler, and several of the optional toolboxes are available to faculty, staff, and students. To see the kind and quantity of all MATLAB licenses plus the number that you are currently using you can use the matlab_licenses command:

$ module load matlab
$ matlab_licenses

The MATLAB client can be run in the front-end for application development, however, computationally intensive jobs must be run on compute nodes.

The following sections provide several examples illustrating how to submit MATLAB jobs to a Linux compute cluster.

    Matlab Script (.m File)
    Implicit Parallelism
    Profile Manager
    Parallel Computing Toolbox (parfor)
    Parallel Toolbox (spmd)
    Distributed Computing Server (parallel job)

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Matlab Script (.m File)

Matlab Script (.m File)

This section illustrates how to submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and gets three random numbers.

Prepare a MATLAB script myscript.m, and a MATLAB function file myfunction.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name);

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;
% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Also, prepare a job submission file, here named myjob.sub. Run with the name of the script:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"

# Load module, and set up environment for Matlab to run
module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
# Run Matlab, with the above options and specifying our .m file
matlab -nodisplay -singleCompThread -r myscript

Submit the job as a single compute node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

carter-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
197986.carter-adm  myusername   standby myjob.sub    4645   1   1    --  00:01 R 00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
carter-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

hostname:carter-a001.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows that a processor core on one compute node (carter-a001) processed the job. Output also displays the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about MATLAB:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Implicit Parallelism

Implicit Parallelism

MATLAB implements implicit parallelism which is automatic multithreading of many computations, such as matrix multiplication, linear algebra, and performing the same operation on a set of numbers. This is different from the explicit parallelism of the Parallel Computing Toolbox.

MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. Since these processor cores, or threads, share a common memory, many MATLAB functions contain multithreading potential. Vector operations, the particular application or algorithm, and the amount of computation (array size) contribute to the determination of whether a function runs serially or with multithreading.

When your job triggers implicit parallelism, it attempts to allocate its threads on all processor cores of the compute node on which the MATLAB client is running, including processor cores running other jobs. This competition can degrade the performance of all jobs running on the node.

When you know that you are coding a serial job but are unsure whether you are using thread-parallel enabled operations, run MATLAB with implicit parallelism turned off. Beginning with the R2009b, you can turn multithreading off by starting MATLAB with -singleCompThread:

$ matlab -nodisplay -singleCompThread -r mymatlabprogram

When you are using implicit parallelism, request exclusive access to a compute node by requesting all cores which are physically available on a node:

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 myjob.sub

For more information about MATLAB's implicit parallelism:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Profile Manager

Profile Manager

MATLAB offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. The 'local' profile runs a MATLAB job on the processor core(s) of the same compute node, or front-end, that is running the client. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager.

To prepare a user-defined cluster profile, use the Cluster Profile Manager in the Parallel menu. This profile contains the PBS details (queue, nodes, ppn, walltime, etc.) of your job submission. Ultimately, your cluster profile will be an argument to MATLAB functions like batch().

For your convenience, ITaP provides a generic cluster profile that can be downloaded: mypbsprofile.settings

Please note that modifications may be required to make mypbsprofile.settings work on some clusters. You may need to change nodes, ppn, walltime, and queuename values specified in the file. Proceed to import this profile only after you have made these changes.

To import the profile, start a MATLAB session and select Manage Cluster Profiles... from the Parallel menu. In the Cluster Profile Manager, select Import, navigate to the folder containing the profile, select mypbsprofile.settings and click OK. Remember that the profile will need to be customized for your specific needs. If you have any questions, please contact us.

For detailed information about MATLAB's Parallel Computing Toolbox, examples, demos, and tutorials:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Parallel Computing Toolbox (parfor)

Parallel Computing Toolbox (parfor)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. It offers a shared-memory computing environment with a maximum of 12 workers (labs, threads; starting in version R2011a) running on the local cluster profile in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses.

This section illustrates the fine-grained parallelism of a parallel for loop (parfor) in a pool job.

The following examples illustrate a method for submitting a small, parallel, MATLAB program with a parallel loop (parfor statement) as a job to a PBS queue. This MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each iteration of the parfor loop.

This method uses the qsub command to submit a MATLAB client which calls the MATLAB batch() function with a user-defined cluster profile.

Prepare a MATLAB pool program in a MATLAB script with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
numlabs = parpool('poolsize');
fprintf('        hostname                         numlabs  labindex  iteration\n')
fprintf('        -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;        % get elapsed time in parallel loop
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)

The execution of a pool job starts with a worker executing the statements of the first serial region up to the parfor block, when it pauses. A set of workers (the pool) executes the parfor block. When they finish, the first worker resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

Prepare a MATLAB script that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Profile','mypbsprofile','Pool',4,'CaptureDiary',true);
wait(pjob);
diary(pjob);
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=01:00:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m.

Once this job starts, a second job submission is made.

View job status:

$ qstat -u myusername

carter-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
199025.carter-adm  myusername   standby myjob.sub   30197   1   1    --  01:00:00 R 00:00:00
199026.carter-adm  myusername   standby Job1          668   4   4    --  01:00:00 R 00:00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
carter-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013a (8.1.0.604) 64-bit (glnxa64)
                             February 15, 2013

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclbatch.m
carter-a000.rcac.purdue.edu
SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  carter-a001.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  carter-a002.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  carter-a001.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  carter-a002.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  carter-a003.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  carter-a003.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  carter-a004.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  carter-a004.rcac.purdue.edu            4         1          8

SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu
Elapsed time in parallel loop:   5.411486

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

For more information about MATLAB Parallel Computing Toolbox:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Parallel Toolbox (spmd)

Parallel Toolbox (spmd)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; versions R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses.

This section illustrates how to submit a small, parallel, MATLAB program with a parallel region (spmd statement) as a batch, MATLAB pool job to a PBS queue.

This example uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets a Matlab .m with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the spmd statement. This job is completely off the front end.

Prepare a MATLAB script called myscript.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
p = parpool('4');
fprintf('                    hostname                         numlabs  labindex\n')
fprintf('                    -------------------------------  -------  --------\n')
tic;

% PARALLEL REGION
spmd
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel region
delete(p);
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of the script:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> parallel.defaultClusterProfile('mypbsprofile');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license and four DCS licenses:

$ qsub -l nodes=1:ppn=1,naccesspolicy=shared,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

Once this job starts, a second job submission is made.

View job status:

$ qstat -u myusername

carter-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
332026.carter-adm  myusername   standby myjob.sub   31850   1   1    --  00:01 R 00:00
332028.carter-adm  myusername   standby Job1          668   4   4    --  00:01 R 00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
carter-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:carter-a001.rcac.purdue.edu

Starting matlabpool using the 'mypbsprofile' profile ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 2:
  PARALLEL REGION:  carter-a002.rcac.purdue.edu            4         2
Lab 1:
  PARALLEL REGION:  carter-a001.rcac.purdue.edu            4         1
Lab 3:
  PARALLEL REGION:  carter-a003.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  carter-a004.rcac.purdue.edu            4         4

Sending a stop signal to all the labs ... stopped.

SERIAL REGION:  hostname:carter-a001.rcac.purdue.edu
Elapsed time in parallel region:   3.382151

Output shows the name of one compute node (a001) that processed the job submission file myjob.sub and the two serial regions. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a001,a002,a003,a004) that processed the four parallel regions. The total elapsed time demonstrates that the jobs ran in parallel.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about MATLAB Parallel Computing Toolbox:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Matlab path breadcrumb divider Distributed Computing Server (parallel job)

Distributed Computing Server (parallel job)

The MATLAB Parallel Computing Toolbox (PCT) enables a parallel job via the MATLAB Distributed Computing Server (DCS). The tasks of a parallel job are identical, run simultaneously on several MATLAB workers (labs), and communicate with each other. This section illustrates an MPI-like program.

This section illustrates how to submit a small, MATLAB parallel job with four workers running one MPI-like task to a PBS queue. The MATLAB program broadcasts an integer to four workers and gathers the names of the compute nodes running the workers and the lab IDs of the workers.

This example uses the PBS qsub command to submit a Matlab script with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the parallel job. This job is completely off the front end.

Prepare a MATLAB script named myscript.m :

% FILENAME:  myscript.m

% Specify pool size.
% Convert the parallel job to a pool job.
parpool('4');
spmd

if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if labindex == 1
    disp(result)
end

end   % spmd
matlabpool close force;
quit;

Also, prepare a job submission, here named myjob.sub. Run with the name of the script:

# FILENAME:  myjob.sub

echo "myjob.sub"

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

Once this job starts, a second job submission is made.

View job status:

$ qstat -u myusername

carter-adm.rcac.purdue.edu:
                                                                   Req'd  Req'd   Elap
Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
465534.carter-adm myusername   standby myjob.sub    5620   1   1    --  00:05 R 00:00
465545.carter-adm myusername   standby Job2          --    4   4    --  00:01 R   --

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
carter-a006.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
Lab 1:
  carter-a006.rcac.purdue.edu:4:1:1000
  carter-a007.rcac.purdue.edu:4:2:1000
  carter-a008.rcac.purdue.edu:4:3:1000
  carter-a009.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of one compute node (a006) that processed the job submission file myjob.sub. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a006,a007,a008,a009) that processed the four parallel regions.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

For more information about parallel jobs:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Octave

Octave

GNU Octave is a high-level, interpreted, programming language for numerical computations. Octave is a structured language (similar to C) and mostly compatible with MATLAB. You may use Octave to avoid the need for a MATLAB license, both during development and as a deployed application. By doing so, you may be able to run your application on more systems or more easily distribute it to others.

This section illustrates how to submit a small Octave job to a PBS queue. This Octave example computes the inverse of a matrix.

Prepare an Octave script file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

The command octave myjob.m (without the redirection) also works in the preceding script.

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

A =

   1   2   3
   4   5   6
   7   8   0

ans =

  -1.77778   0.88889  -0.11111
   1.55556  -0.77778   0.22222
  -0.11111   0.22222  -0.11111

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Octave:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Perl

Perl

Perl is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Perl job to a PBS queue. This Perl example prints a single line of text.

Prepare a Perl input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

print "hello, world\n"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -w option to issue warnings.
/usr/bin/perl -w myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Perl:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Python

Python

Python is a high-level, general-purpose, interpreted, dynamic programming language. We suggest using Anaconda which is a Python distribution made for large-scale data processing, predictive analytics, and scientific computing. This section illustrates how to submit a small Python job to a PBS queue. This Python example prints a single line of text.

Example 1: Hello world

Prepare a Python input file with an appropriate filename, here named myjob.in:

# FILENAME:  hello.py

import string, sys
print "Hello, world"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load anaconda
cd $PBS_O_WORKDIR

python hello.py

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Hello, world!

Any output written to standard error will appear in myjob.sub.emyjobid.

Example 2: Matrix multiply

Save the following script as matrix.py:

# Matrix multiplication program

x = [[3,1,4],[1,5,9],[2,6,5]]
y = [[3,5,8,9],[7,9,3,2],[3,8,4,6]]

result = [[sum(a*b for a,b in zip(x_row,y_col)) for y_col in zip(*y)] for x_row in x]

for r in result:
        print(r)

Change the last line in the job submission file above to read:

python matrix.py

The standard output file from this job will result in the following matrix:

[28, 56, 43, 53]
[65, 122, 59, 73]
[63, 104, 54, 60]

Example 3: Sine wave plot

Save the following script as sine.py:

import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pylab as plt

x = np.linspace(-np.pi, np.pi, 201)
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.savefig('sine.png')

Change your job submission file to submit this script and the job will output a png file and blank standard output and error files.

Installing packages with pip

Pip is a package manager that you can use to install packages permenantly in your user profile. With pip you do not have to import packages at the begining of your scripts but you should be aware of what you have installed as commands from different packages may start to overlap.

Install packages to your user profile:

$ pip install --user PackageName

Check which packages are installed globally:

$ pip list

Check which packages you have personally installed:

$ pip list --user

Snapshot installed packages:

$ pip freeze > requirements.txt

Install packages from package snapshot:

$ pip install --user -r requirements.txt

Installing packages from source

We maintain several Anaconda installations. Anaconda maintains a package of numerous popular scientific Python libraries in a single installation. If you need a Python library not included with normal Python we recommend first checking Anaconda. For a list of modules currently installed in the Anaconda Python distribution:

$ module load anaconda
$ conda list
# packages in environment at /apps/rhel6/Anaconda-2.0.1:
#
_license                  1.1                      py27_0
anaconda                  2.0.1                np18py27_0
...

If you see the library you need you will be able to import that library into your Python code as normal with that Anaconda module loaded.

If you don't find the package you need in the Anaconda installation, you should be able to install the library in your own Anaconda customization/home directory. Use the following instructions as a guideline. Make sure you have a download link to the software (usually it will be a tar.gz archive file). You will substitute it on the wget line below.

$ mkdir ~/src
$ cd ~/src
$ wget http://path/to/source/tarball/app-1.0.tar.gz
$ tar xzvf app-1.0.tar.gz
$ cd app-1.0
$ module load anaconda
$ python setup.py install --user
$ cd ~
$ python
>>> import app
>>> quit()

The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

If you need further help or run into any issues installing a library contact us at rcac-help@purdue.edu or drop by Coffee Hour for in-person help.

For more information about Python:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider R

R

R, a GNU project, is a language and environment for statistics and graphics. It is an open source version of the S programming language. This section illustrates how to submit a small R job to a PBS queue. This R example computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load r
cd $PBS_O_WORKDIR

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.in

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # FILENAME:  myjob.in
>
> # Compute a Pythagorean triple.
> a = 3
> b = 4
> c = sqrt(a*a + b*b)
> c     # display result
[1] 5
>

Any output written to standard error will appear in myjob.sub.emyjobid.

Installing Packages

To install additional R packages, create a folder in your home directory called Rlibs. You will need to be running a recent version of R (2.14.0 or greater as of this writing):

$ mkdir ~/Rlibs

If you are running the bash shell (the default on our clusters), add the following line to your .bashrc (Create the file ~/.bashrc if it doesn't already exist. You may also need to run "ln -s .bashrc .bash_profile" if .bash_profile doesn't exist either):

export R_LIBS=~/Rlibs:$R_LIBS

If you are running csh or tcsh, add the following to your .cshrc:

setenv R_LIBS ~/Rlibs:$R_LIBS

Now run "source .bashrc" and start R:

$ module load r
$ R
> .libPaths()
[1] "/home/myusername/Rlibs"
[2] "/apps/rhel6/R/3.1.0/lib64/R/library"

.libPaths() should output something similar to above if it is set up correctly. Now let's try installing a package.

> install.packages('packagename',"~/Rlibs","http://cran.case.edu/")

The above command should download and install the requested R package, which upon completion can then be loaded.

> library('packagename')

If your R package relies on a library that's only installed as a module (for this example we'll use GDAL), you can install it by doing the following:

$ module load gdal
$ module load r
$ R
> install.packages('rgdal',"~/Rlibs","http://cran.case.edu/", configure.args="--with-gdal-include=$GDAL_HOME/include
--with-gdal-lib=$GDAL_HOME/lib"))
Repeat install.packages(...) for any packages that you need. Your R packages should now be installed. For more information about R:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider SAS

SAS

SAS is an integrated system supporting statistical analysis, report generation, business planning, and forecasting. This section illustrates how to submit a small SAS job to a PBS queue. This SAS example displays a small dataset.

Prepare a SAS input file with an appropriate filename, here named myjob.sas:

* FILENAME:  myjob.sas

/* Display a small dataset. */
TITLE 'Display a Small Dataset';
DATA grades;
INPUT name $ midterm final;
DATALINES;
Anne     61 64
Bob      71 71
Carla    86 80
David    79 77
Eduardo  73 73
Fannie   81 81
;
PROC PRINT data=grades;
RUN;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load sas
cd $PBS_O_WORKDIR

# -stdio:   run SAS in batch mode:
#              read SAS input from stdin
#              write SAS output to stdout
#              write SAS log to stderr
# -nonews:  do not display SAS news
# SAS runs in batch mode when the name of the SAS command file
# appears as a command-line argument.
sas -stdio -nonews myjob

Submit the job:

$ qsub -l nodes=1:ppn=1,naccesspolicy=singleuser myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

                                                           The SAS System                       10:59 Wednesday, January 5, 2011   1

                                                 Obs    name       midterm    final

                                                  1     Anne          61        64
                                                  2     Bob           71        71
                                                  3     Carla         86        80
                                                  4     David         79        77
                                                  5     Edwardo       73        73
                                                  6     Fannie        81        81

View the SAS log in the standard error file, myjob.sub.emyjobid:

1                                                          The SAS System                           12:32 Saturday, January 29, 2011

NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.2 (TS2M0)
      Licensed to PURDUE UNIVERSITY - T&R, Site 70063312.
NOTE: This session is executing on the Linux 2.6.18-194.17.1.el5rcac2 (LINUX) platform.

NOTE: SAS initialization used:
      real time           0.70 seconds
      cpu time            0.03 seconds

1          * FILENAME:  myjob.sas
2
3          /* Display a small dataset. */
4          TITLE 'Display a Small Dataset';
5          DATA grades;
6          INPUT name $ midterm final;
7          DATALINES;

NOTE: The data set WORK.GRADES has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.18 seconds
      cpu time            0.01 seconds

14         ;
15         PROC PRINT data=grades;
16         RUN;

NOTE: There were 6 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.32 seconds
      cpu time            0.04 seconds

NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           1.28 seconds
      cpu time            0.08 seconds
For more information about SAS:

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Singularity

Singularity

Singularity

What is Singularity?

Singularity is a new feature of the Community Clusters allowing the portability and reproducibility of operating system and application environments through the use of Linux containers. It gives users complete control over their environment.

Singularity is like Docker but tuned explicitly for HPC clusters.

More information is available from the project’s website.

Features

  • Run the latest applications on an Ubuntu or Centos userland
  • Gain access to the latest developer tools
  • Launch MPI programs easily
  • Much more

Singularity’s user guide is available at: singularity.lbl.gov/user-guide

Example

Here is an example using an Ubuntu 16.04 userland on Radon:

support@radon-fe00:~$ singularity exec /depot/itap/singularity/ubuntu1604.img cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"

Here is another example using a Centos 7 userland on our Redhat Enterprise 6 clusters:

support@radon-fe00:~$ singularity exec /depot/itap/singularity/centos7.img cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core) 

RCAC Specific Notes

All service providers will integrate Singularity slightly differently depending on site. The largest customization will be which default files are inserted into your images so that routine services will work.

Services we configure for your images include DNS settings and account information. File systems we overlay into your images are your home directory, scratch, Data Depot, and application file systems.

Here is a list of paths:

  • /etc/resolv.conf
  • /etc/hosts
  • /home/$USER
  • /apps
  • /scratch
  • /depot

Changing Singularity images

Due to limitations in Linux, Singularity images must be changed by users on their own systems. You can find information and documentation for this process at:

You can also find base images for Centos and Ubuntu on any front-end if you desire to perform customizations by hand: /depot/itap/singularity

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Spark path breadcrumb divider Spark

Spark

Spark is a fast and general engine for large-scale data processing. This section walks through how to submit and run a Spark job using PBS on the compute nodes of Carter.

pbs-spark-submit launches an Apache Spark program within a PBS job, including starting the Spark master and worker processes in standalone mode, running a user supplied Spark job, and stopping the Spark master and worker processes. The Spark program and its associated services will be constrained by the resource limits of the job and will be killed off when the job ends. This effectively allows PBS to act as a Spark cluster manager. The following steps assume that you have a Spark program that can run without errors. To use Spark and pbs-spark-submit, you need to load the following two modules to setup SPARK_HOME and PBS_SPARK_HOME environment variables.
module load spark
module load pbs-spark-submit
The following example submission script serves as a template to build your customized, more complex Spark job submission. This job requests 2 whole compute nodes for 10 minutes, and submits to the default queue.
#PBS -N spark-pi
#PBS -l nodes=2:ppn=16

#PBS -l walltime=00:10:00
#PBS -q standby
#PBS -o spark-pi.out
#PBS -e spark-pi.err

cd $PBS_O_WORKDIR
module load spark
module load pbs-spark-submit
pbs-spark-submit $SPARK_HOME/examples/src/main/python/pi.py 1000
In the submission script above, this command submits the pi.py program to the nodes that are allocated to your job.
pbs-spark-submit $SPARK_HOME/examples/src/main/python/pi.py 1000
You can set various environment variables in your submission script to change the setting of Spark program. For example, the following line sets the SPARK_LOG_DIR to $HOME/log. The default value is current working directory.
export SPARK_LOG_DIR=$HOME/log
The same environment variables can be set via the pbs-spark-submit command line argument. For example, the following line sets the SPARK_LOG_DIR to $HOME/log2.
pbs-spark-submit --log-dir $HOME/log2
The following table summarizes the environment variables that can be set. Please note that setting them from the command line arguments overwrites the ones that are set via shell export. Setting them from shell export overwrites the system default values.
Environment Variable Default Shell Export Command Line Args
SPAKR_CONF_DIR $SPARK_HOME/conf export SPARK_CONF_DIR=$HOME/conf --conf-dir or -C
SPAKR_LOG_DIR Current Working Directory export SPARK_LOG_DIR=$HOME/log --log-dir or -L
SPAKR_LOCAL_DIR /tmp export SPARK_LOCAL_DIR=$RCAC_SCRATCH/local NA
SCRATCHDIR Current Working Directory export SCRATCHDIR=$RCAC_SCRATCH/scratch --work-dir or -d
SPARK_MASTER_PORT 7077 export SPARK_MASTER_PORT=7078 NA
SPARK_DAEMON_JAVA_OPTS None export SPARK_DAEMON_JAVA_OPTS="-Dkey=value" -D key=value
Note that SCRATCHDIR must be a shared scratch directory across all nodes of a job. In addition, pbs-spark-submit supports command line arguments to change the properties of the Spark daemons and the Spark jobs. For example, the --no-stop argument tells Spark to not stop the master and worker daemons after the Spark application is finished, and the --no-init argument tells Spark to not initialize the Spark master and worker processes. This is intended for use in a sequence of invocations of Spark programs within the same job.
pbs-spark-submit --no-stop   $SPARK_HOME/examples/src/main/python/pi.py 800
pbs-spark-submit --no-init   $SPARK_HOME/examples/src/main/python/pi.py 1000
Use the following command to see the complete list of command line arguments.
pbs-spark-submit -h
To learn programming in Spark, refer to Spark Programming Guide To learn submitting Spark applications, refer to Submitting Applications

path breadcrumb divider Running Jobs path breadcrumb divider Example Jobs path breadcrumb divider Specific Applications path breadcrumb divider Windows

Windows

Windows 10 virtual machines are supported as batch jobs on HPC systems. This section illustrates how to submit a job and run a Windows instance in order to run Windows applications. The pre-configured Windows image has access to your Carter scratch directory, and a link to Research Data Depot.

  • Log in via Thinlinc.
  • Ssh to Carter, using the -Y option for X11 forwarding.
    $ ssh -Y carter.rcac.purdue.edu
    
  • Submit an interactive PBS job:
    $ qsub -X -I -l walltime=8:00:00 -l nodes=1:ppn=16
    
  • Load the "qemu" module:
    $ module load qemu
    
  • Run the "windows10" script:
    $ windows10
    

The Windows 10 desktop will open, and automatically log in as an administrator. Changes to the image will be stored in a file in your Carter scratch directory.

The Windows 10 image available on Carter has the following software packages preloaded:

  • ArcGIS Desktop 10.4
  • ArcGIS Pro
  • Anaconda Python 2 and Python 3
  • ENVI5.3/IDL 8.5
  • ERDAS Imagine
  • GRASS GIS 7.2.1
  • JMP 13
  • Maple 2016
  • Matlab R2016b
  • Microsoft Office 2016
  • Notepad++
  • Pix4d Mapper
  • QGIS Desktop
  • Rstudio

path breadcrumb divider Common Error Messages path breadcrumb divider cannot connect to X server

Problem

You receive the following message after entering a command to bring up a graphical window

cannot connect to X server

Solution

This can happen due to multiple reasons:

  • Reason 1: Your SSH client software does not support graphical display by itself (e.g. SecureCRT or PuTTY).
    • Solution: Try using a client software like Thinlinc or MobaXTerm as described here.
  • Reason 2: You did not enable X11 forwarding in your SSH connection.

    • Solution: If you are in a Windows environment, make sure that X11 forwarding is enabled in your connection settings (e.g. in MobaXTerm or PuTTY). If you are in a Linux environment, try

      ssh -Y -l username hostname

  • Reason 3: If you are trying to open a graphical window within an interactive job, make sure you are using the -X option with qsub after following the previous step(s) for connecting to the front-end. Please see the example here.
  • Reason 4: If none of the above apply, make sure that you are within quota of your home directory as described here.

path breadcrumb divider Common Error Messages path breadcrumb divider E233: cannot open display

Problem

You receive the following message after entering a command to bring up a graphical window

E233: cannot open display

Solution

This means you did not enable X11 forwarding which supports remote graphical access to applications. Try

ssh -Y -l username hostname

path breadcrumb divider Common Error Messages path breadcrumb divider How do I check my job output while it is running

Problem

After submitting your job to the cluster, you want to see the output that it generates.

Solution

There are two simple ways to do this:

  • qpeek: Use the tool qpeek to check the job's output. Syntax of the command is:
    qpeek <jobid>
  • Redirect your output to a file: To do this you need to edit the main command in your jobscript as shown below. Please note the redirection command starting with the greater than (>) sign.
    myapplication ...other arguments... > "${PBS_JOBID}.output"
    On any front-end, go to the working directory of the job and scan the output file.
    tail "<jobid>.output"
    Make sure to replace <jobid> with an appropriate jobid.

path breadcrumb divider Common Error Messages path breadcrumb divider bash: command not found

Problem

You receive the following message after typing a command

bash: command not found

Solution

This means the system doesn't know how to find your command. Typically, you need to load a module to do it.

path breadcrumb divider Common Error Messages path breadcrumb divider qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Problem

You receive the following message after attempting to delete a job with the 'qdel' command

qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Solution

This error usually indicates that at least one node running your job has stopped responding or crashed. Please forward the job ID to rcac-help@purdue.edu, and ITaP Research Computing staff can help remove the job from the queue.

path breadcrumb divider Common Error Messages path breadcrumb divider bash: module command not found

Problem

You receive the following message after typing a command, e.g. module load intel

bash: module command not found

Solution

The system cannot find the module command. You need to source the modules.sh file as below

source /etc/profile.d/modules.sh

or

#!/bin/bash -i

path breadcrumb divider Common Error Messages path breadcrumb divider /usr/bin/xauth: error in locking authority file

Problem

I receive this message when logging in:

/usr/bin/xauth: error in locking authority file

Solution

Your home directory disk quota is full. You may check your quota with myquota.

You will need to free up space in your home directory.

path breadcrumb divider Common Error Messages path breadcrumb divider 1234.carter-adm.rcac.purdue.edu.SC: line 12: 12345 Killed

Problem

Your PBS job stopped running and you received an email with the following:

/var/spool/torque/mom_priv/jobs/1234.carter-adm.rcac.purdue.edu.SC: line 12: 12345 Killed <command name>

Solution

This means that the node your job was running on ran out of memory to support your program or code. This may be due to your job or other jobs sharing your node(s) consuming more memory in total than is available on the node. Your program was killed by the node to preserve the operating system. There are two possible causes:

  • On Carter, jobs using less than 16 cores per node default to allowing your jobs to share the node(s) with other jobs. You should request all cores of the node or request exclusive access. Either your job or one of the other jobs running on the node consumed too much memory. Requesting exclusive access will give you full control over all the memory on the node.
  • You requested exclusive access to the nodes but your job requires more memory than is available on the node. You should use more nodes if your job supports MPI or run a smaller dataset.

path breadcrumb divider Common Error Messages path breadcrumb divider My SSH connection hangs

Problem

Your console hangs while trying to connect to a RCAC Server.

Solution

This can happen due to various reasons. Most common reasons for hanging SSH terminals are:

  • Network: If you are connected over wifi, make sure that your Internet connection is fine.
  • Busy front-end server: When you connect to a cluster, you SSH to one of the front-ends. Due to transient user loads, one or more of the front-ends may become unresponsive for a short while. To avoid this, try reconnecting to the cluster or wait until the server you have connected to has reduced load.
  • File system issue: If a server has issues with one or more of the file systems (home, scratch, or depot) it may freeze your terminal. To avoid this you can connect to another front-end.

If neither of the suggestions above work, please contact rcac-help@purdue.edu specifying the name of the server where your console is hung.

path breadcrumb divider Common Questions path breadcrumb divider What is the "debug" queue?

What is the "debug" queue?

The debug queue allows you to quickly start small, short, interactive jobs in order to debug code, test programs, or test configurations. You are limited to one running job at a time in the queue, and you may run up to two compute nodes for 30 minutes.

path breadcrumb divider Common Questions path breadcrumb divider How can my collaborators outside Purdue get access to Carter?

How can my collaborators outside Purdue get access to Carter?

Your Departmental Business Office can submit a Request for Privileges (R4P) to provide access to collaborators outside Purdue, including recent graduates. Once the R4P process is complete, you will need to add your outside collaborators to Carter as you would any for any Purdue collaborator.

path breadcrumb divider Common Questions path breadcrumb divider How can I get email alerts about my PBS job status?

Question

How can I be notified when my PBS job was executed and if it completed successfully?

Answer

Submit your job with the following command line arguments

qsub -M email_address -m bea myjobsubmissionfile

Or, include the following in your job submission file.

#PBS -M email_address                                                  
#PBS -m bae                                                                         

The -m option can have the following letters; "a", "b", and "e":

a - mail is sent when the job is aborted by the batch system.
b - mail is sent when the job begins execution.
e - mail is sent when the job terminates.

path breadcrumb divider Common Questions path breadcrumb divider Can I extend the walltime on a PBS job?

Can I extend the walltime of a PBS job on Carter?

In some circumstances, yes. Walltime extensions must be requested of and completed by Research Computing staff. Walltime extension requests will be considered on named (your advisor or research lab) queues. Standby or debug queue jobs cannot be extended.

Extension requests are at the discretion of Research Computing staff based on factors such as any upcoming maintenance (e.g., we can extend up until maintenance but not into or past it) and resource availability. Extensions can be made past the normal maximum walltime but these jobs are subject to early termination should a conflicting maintenance downtime be scheduled.

We ask that you make accurate walltime requests during job submissions. Accurate walltimes will allow the job scheduler to efficiently and quickly schedule jobs on the cluster. When making initial job requests and extension requests, please consider extensions can impact scheduling efficiency for all users of the cluster.

Requests can be made to rcac-help@purdue.edu. We ask that you:

  • Provide numerical job IDs, cluster name, and your desired extension amount.
  • Provide at least 24 hours notice before job will end (more if request is made on a weekend or holiday).
  • Consider making requests during business hours. We may not be able to respond in time to requests made after-hours, on a weekend, or on a holiday.

path breadcrumb divider Common Questions path breadcrumb divider Do I need to do anything to my firewall to access Carter?

Do I need to do anything to my firewall to access Carter?

No firewall changes are needed to access Carter. However, to access data through Network Drives (i.e., CIFS, "Z: Drive"), you must be on a Purdue campus network or connected through VPN.

path breadcrumb divider Common Questions path breadcrumb divider My scratch files were purged. Can I retrieve them?

My scratch files were purged. Can I retrieve them?

Unfortunately, once files are purged, they are purged permanently and cannot be retrieved. Notices of pending purges are sent one week in advance to your Purdue email address. Be sure to regularly check your Purdue email or set up forwarding to an account you do frequently check.

Can you tell me what files were purged?

You can see a list of files removed with the command lastpurge. The command accepts a -n option to specify how many weeks/purges ago you want to look back at.

path breadcrumb divider Common Questions path breadcrumb divider How can I get access to Sentaurus software?

Question

How can I get access to Sentaurus tools for micro- and nano-electronics design?

Answer

Sentaurus software license requires a signed NDA. Please contact Dr. Mark Johnson, Director of ECE Instructional Laboratories to complete the process.

Once the licensing process is complete and you have been added into a cae2 Unix group, you could use Sentaurus on RCAC community clusters by loading the corresponding environment module:

module load sentaurus

path breadcrumb divider Common Questions path breadcrumb divider Can I share data with outside collaborators?

Yes! Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

path breadcrumb divider Common Questions path breadcrumb divider Can I get a private server from RCAC?

Question

Can I get a private (virtual or physical) server from RCAC?

Answer

Often, researchers may want a private server to run databases, web servers, or other software. RCAC currently does not offer private servers (formerly known as "Firebox").

For use cases like this, we recommend the Jetstream Cloud (http://jetstream-cloud.org/) an NSF-funded science cloud allocated through the XSEDE project. RCAC staff can help get you access to Jetstream to test, or to help write an allocation proposal for larger projects.

Alternatively, you may consider commercial cloud providers such as Amazon Web Services, Azure, or Digital Ocean. These services are very flexible, but do come with a monetary cost.

path breadcrumb divider Biography of Dennis Lee Carter path breadcrumb divider Overview of Dennis Lee Carter

Portrait of Dennis Lee Carter

Dennis Lee Carter

In two words and five musical notes, Dennis Lee Carter helped make Intel microprocessors, like those in the Carter Community Cluster, a familiar and trusted brand, not just among engineers but also among computer-buying consumers around the world. In turn, that helped spur the widespread adoption of personal computers and all that followed it.

Carter, who earned his master's degree in electrical engineering from Purdue in 1974, is credited with creating and implementing the internationally recognized "Intel Inside" campaign. The effort developed brand awareness of the microprocessor as the key ingredient in a personal computer. It also put Intel's logo on the outside of the vast majority of the world's PCs and made its five-note jingle one of the most recognizable tunes on television.

As the market changed with the dawn of the PC era, Carter, who also was an instructor of electrical engineering technology while at Purdue, saw the need for Intel to begin talking to a broad audience beyond the design engineers who had been its traditional focus. He developed an innovative cooperative advertising program in which Intel shared the cost with computer manufacturers to promote its microprocessors as the "computer inside the computer." He also worked with Intel President Andy Grove to create the iconic Pentium brand name.

A native of Louisville, Kentucky, Carter joined Intel in 1981 and from 1985 to 1989 served as technical assistant to the president. He progressed from marketing manager for the End-User Marketing Group to general manager of the End-User Components Division to director of marketing. In 1992, he was recognized for his accomplishments by being elected a vice president of Intel.

Before joining Intel, Carter was an engineering manager at Rockwell International responsible for product design of collision avoidance avionics systems and also helped develop the first microprocessor-based radar altimeters. He holds several patents, among other things for radio frequency (RF) antenna designs.

Carter earned bachelor's degrees in electrical engineering and physics from Rose Hulman Institute of Technology in Terre Haute, Ind. He received his MBA from Harvard University in 1981. In 2003, he earned a master's in astronomy from Swinburne University of Technology in Melbourne, Australia.

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.