Rossmann - Complete User Guide

Conventions Used in this Document

This document follows certain typesetting and naming conventions:

  • Colored, underlined text indicates a link.
  • Colored, bold text highlights something of particular importance.
  • Italicized text notes the first use of a key concept or term.
  • Bold, fixed-width font text indicates a command or command argument that you type verbatim.
  • Examples of commands and output as you would see them on the command line will appear in colored blocks of fixed-width text such as this:
    $ example
    This is an example of commands and output.
    
  • All command line shell prompts appear as a single dollar sign ("$"). Your actual shell prompt may differ.
  • All examples work with bash or ksh shells. Where different, changes needed for tcsh or csh shell users appear in example comments.
  • All names that begin with "my" illustrate examples that you replace with an appropriate name. These include "myusername", "myfilename", "mydirectory", "myjobid", etc.
  • The term "processor core" or "core" throughout this guide refers to the individual CPU cores on a processor chip.

Overview of Rossmann

Rossmann is a compute cluster operated by ITaP and is a member of Purdue's Community Cluster Program. Rossmann went into production on September 1, 2010. It consists of HP (Hewlett Packard) ProLiant DL165 G7 nodes with 64-bit, dual 12-core AMD Opteron 6172 processors (24 cores per node) and 48 GB, 96 GB, or 192 GB of memory. All nodes have 10 Gigabit Ethernet interconnects and a 5-year warranty. Rossmann is planned to be decommissioned in 2015.

Namesake

Rossmann is named in honor of Michael Rossmann, Purdue's Hanley Distinguished Professor of Biological Sciences. More information about his life and impact on Purdue is available in an ITaP Biography of Michael Rossmann.

Detailed Hardware Specification

Rossmann consists of five logical sub-clusters, each with a different memory/storage configuration. All nodes in the cluster have dual 12-core AMD Opteron 6172 processors and 10 Gigabit Ethernet (10GigE).

Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect Disk
Rossmann-A 392 Two 2.1 GHz 12-Core AMD 6172 24 48 GB 10 GigE 250 GB
Rossmann-B 40 Two 2.1 GHz 12-Core AMD 6172 24 96 GB 10 GigE 250 GB
Rossmann-C 2 Two 2.1 GHz 12-Core AMD 6172 24 192 GB 10 GigE 1 TB
Rossmann-D 4 Two 2.1 GHz 12-Core AMD 6172 24 192 GB 10 GigE 2 TB

Rossmann nodes run Red Hat Enterprise Linux 5 (RHEL5) and use Moab Workload Manager 7 and TORQUE Resource Manager 4 as the portable batch system (PBS) for resource and job management. Rossmann also runs jobs for BoilerGrid whenever processor cores in it would otherwise be idle. The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).

For more information about the TORQUE Resource Manager:

On Rossmann, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • PGI 11.8-0
  • ACML
  • OpenMPI 1.6.3

To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

Node Interconnect Systems

The system interconnect is the networking technology that connects nodes of a cluster to each other. This is often much faster and sometimes radically different from the networking available between a resource and other machines or the outside world. Interconnects have different characteristics that may affect parallel message-passing programs and their design. Each ITaP research resource has different interconnect options available, and some have more than one available to all or only portions of the resource's nodes. For information on which interconnects are available, refer to the hardware specification for the resource above. Details about the specific interconnects available on Rossmann follow.

10 Gigabit Ethernet

Ten Gigabit Ethernet (10GigE) is a form of Ethernet, currently the most widely used network link technology, that is able to transfer data at rates of approximately ten Gigabits per second—hundred times faster than 100 Mbps Ethernet. Consequently, 10GigE cable runs must be much shorter as well.

Accounts on Rossmann

Purchasing Nodes

Information Technology at Purdue (ITaP) operates a significant shared cluster computing infrastructure developed over several years through focused acquisitions using funds from grants, faculty startup packages, and institutional sources. These "community clusters" are now at the foundation of Purdue's research cyberinfrastructure.

We strongly encourage any Purdue faculty or staff with computational needs to join this growing community and enjoy the enormous benefits this shared infrastructure provides:

  • Peace of Mind
    ITaP system administrators take care of security patches, attempted hacks, operating system upgrades, and hardware repair so faculty and graduate students can concentrate on research.
  • Low Overhead
    ITaP data centers provide infrastructure such as networking, racks, floor space, cooling, and power.
  • Cost Effective
    ITaP works with vendors to obtain the best price for computing resources by pooling funds from different disciplines to leverage greater group purchasing power.

Through the Community Cluster Program, Purdue affiliates have invested several million dollars in computational and storage resources from Q4 2006 to the present with great success in both the research accomplished and the money saved on equipment purchases.

For more information or to purchase access to our latest cluster today, see the Access Purchase page. To get updates on ITaP's community cluster program, please subscribe to the Community Cluster Program Mailing List.

Cluster Partner Services

In addition to priority access to a number of processor cores, partners in our Community Cluster Program may also take advantage of additional services offered to them free of charge. These include:

  • Unix Group
    Restrict access to files or programs by using Unix file permissions on the basis of those you approve for access to your queues.
  • Application Storage
    Store your custom application binaries in central storage that is backed-up and available from all clusters, but not part of your personal home directory.
  • Subversion (SVN) Repository
    Store and manage your code or documents through a centrally-supported, professional-grade, revision control system.

To request any of these be created for your research group, or for more information, please email rcac-help@purdue.edu.

Obtaining an Account

To obtain an account, you must be part of a research group which has purchased access to Rossmann. Refer to the Accounts / Access page for more details on how to request access.

Login / SSH

To submit jobs on Rossmann, log in to the submission host rossmann.rcac.purdue.edu via SSH. This submission host is actually 4 front-end hosts: rossmann-fe00, rossmann-fe01, rossmann-fe02 and rossmann-fe03. The login process randomly assigns one of these front-ends to each login to rossmann.rcac.purdue.edu. While all of these front-end hosts are identical, each has its own /tmp. Sharing data in /tmp during subsequent sessions may fail. ITaP advises using scratch storage for multisession, shared data instead.

SSH Client Software

Secure Shell or SSH is a way of establishing a secure (encrypted) connection between two computers. It uses public-key cryptography to authenticate the remote computer and (optionally) to allow the remote computer to authenticate the user. Its usual function involves logging in to a remote machine and executing commands, but it also supports tunneling and forwarding of X11 or arbitrary TCP connections. There are many SSH clients available for all operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • The ssh command is pre-installed. Log in using ssh myusername@servername.

Microsoft Windows:

  • PuTTY is an extremely small download of a free, full-featured SSH client.
  • Secure CRT is a commercial SSH client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The ssh command is pre-installed. You may start a local terminal window from "Applications->Utilities". Log in using ssh myusername@servername.
  • MacSSH is another free SSH client.

SSH Keys

SSH works with many different means of authentication. One popular authentication method is Public Key Authentication (PKA). PKA is a method of establishing your identity to a remote computer using related sets of encryption data called keys. PKA is a more secure alternative to traditional password-based authentication with which you are probably familiar.

To employ PKA via SSH, you manually generate a keypair (also called SSH keys) in the location from where you wish to initiate a connection to a remote machine. This keypair consists of two text files: private key and public key. You keep the private key file confidential on your local machine or local home directory (hence the name "private" key). You then log in to a remote machine (if possible) and append the corresponding public key text to the end of a specific file, or have a system administrator do so on your behalf. In future login attempts, PKA compares the public and private keys to verify your identity; only then do you have access to the remote machine.

As a user, you can create, maintain, and employ as many keypairs as you wish. If you connect to a computational resource from your work laptop, your work desktop, and your home desktop, you can create and employ keypairs on each. You can also create multiple keypairs on a single local machine to serve different purposes, such as establishing access to different remote machines or establishing different types of access to a single remote machine. In short, PKA via SSH offers a secure but flexible means of identifying yourself to all kinds of computational resources.

Passphrases and SSH Keys

Creating a keypair prompts you to provide a passphrase for the private key. This passphrase is different from a password in a number of ways. First, a passphrase is, as the name implies, a phrase. It can include most types of characters, including spaces, and has no limits on length. Secondly, the remote machine does not receive this passphrase for verification. Its purpose is only to allow the use of your local private key and is specific to a specific local private key.

Perhaps you are wondering why you would need a private key passphrase at all when using PKA. If the private key remains secure, why the need for a passphrase just to use it? Indeed, if the location of your private keys were always completely secure, a passphrase might not be necessary. In reality, a number of situations could arise in which someone may improperly gain access to your private key files. In these situations, a passphrase offers another level of security for you, the user who created the keypair.

Think of the private key/passphrase combination as being analogous to your ATM card/PIN combination. The ATM card itself is the object that grants access to your important accounts, and as such, should remain secure at all times—just as a private key should. But if you ever lose your wallet or someone steals your ATM card, you are glad that your PIN exists to offer another level of protection. The same is true for a private key passphrase.

When you create a keypair, you should always provide a corresponding private key passphrase. For security purposes, avoid using phrases which automated programs can discover (e.g. phrases that consist solely of words in English-language dictionaries). This passphrase is not recoverable if forgotten, so make note of it. Only a few situations warrant using a non-passphrase-protected private key—conducting automated file backups is one such situation. If you need to use a non-passphrase-protected private key to conduct automated backups to Fortress, see the No-Passphrase SSH Keys section.

SSH X11 Forwarding

SSH supports tunneling of X11 (X-Windows). If you have an X11 server running on your local machine, you may use X11 applications on remote systems and have their graphical displays appear on your local machine. These X11 connections are tunneled and encrypted automatically by your SSH client.

Installing an X11 Server

To use X11, you will need to have a local X11 server running on your personal machine. Both free and commercial X11 servers are available for various operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • An X11 server is at the core of all graphical sessions. If you are logged in to a graphical environment on these operating systems, you are already running an X11 server.

Microsoft Windows:

  • Xming is a free X11 server available for all versions of Windows, although it may occasionally hang and require a restart. Download the "Public Domain Xming" or donate to the development for the newest version.
  • Hummingbird eXceed is a commercial X11 server available for all versions of Windows.
  • Cygwin is another free X11 server available for all versions of Windows. Download and run setup.exe. During installation, you must select the following packages which are not included by default:
    • X-startup-scripts
    • XFree86-lib-compat
    • xorg-*
    • xterm
    • xwinwm
    • lib-glitz-glx1
    • opengl (if you also want OpenGL support, under the Graphics group)
    Once you are running the Cygwin X server, start an xterm, type XWin -multiwindow in it, and then press enter. You may now run your SSH client.

Mac OS X:

  • X11 is available as an optional install on the Mac OS X install disks prior to 10.7/Lion. Run the installer, select the X11 option, and follow the instructions. For 10.7+ please download XQuartz.

Enabling X11 Forwarding in your SSH Client

Once you are running an X11 server, you will need to enable X11 forwarding/tunneling in your SSH client:

  • "ssh": X11 tunneling should be enabled by default. To be certain it is enabled, you may use ssh -Y.
  • PuTTY: Prior to connection, in your connection's options, under "X11", check "Enable X11 forwarding", and save your connection.
  • Secure CRT: Right-click a saved connection, and select "Properties". Expand the "Connection" settings, then go to "Port Forwarding" -> "Remote/X11". Check "Forward X11 packets" and click "OK".

SSH will set the remote environment variable $DISPLAY to "localhost:XX.YY" when this is working correctly. If you had previously set your $DISPLAY environment variable to your local IP or hostname, you must remove any set/export/setenv of this variable from your login scripts. The environment variable $DISPLAY must be left as SSH sets it, which is to a random local port address. Setting $DISPLAY to an IP or hostname will not work.

SSH VNC Connections

If you find that X11 forwarding is not working well due to limited bandwidth or high latency, such as on an off-campus or wireless connection, a VNC connection may be a suitable alternative. Setting up an encrypted VNC connection is more complex than standard X11 forwarding, so VNC is only recommended after testing X11 forwarding.

Installing a VNC client

To use VNC, you will need to have a VNC client running on your personal machine.

Linux / Unix:

  • There are several VNC clients available in the repositories of various linux/unix distributions. Vinagre (GNOME) and KRDC (KDE) are popular choices.

Microsoft Windows:

  • TightVNC is a free VNC client and server for Windows. When installing, unselect TightVNC server, so that only the viewer is installed.

Mac OS X:

  • Chicken is a free VNC client for OS X. While OS X has a built-in VNC client, it doesn't always work well with linux VNC servers.

Enabling VNC over an SSH tunnel

Once you have a VNC client, you will need to open a tunnel in your SSH client:

  • "ssh": ssh rossmann -L 8900/localhost/5901
  • PuTTY: Prior to connection, in your connection's options, under "Tunnels", put 8900 in the source port, localhost:5901 in the destination, and select Local and Auto, then click Add.
  • Secure CRT: Right-click a saved connection, and select "Properties". Expand the "Connection" settings, then go to "Port Forwarding". Under "Locally forwarded connections" click Add, put vnc for the name, 8900 for the local port, and 5901 for the remote port, then click OK to close the dialogs.

Once the SSH tunnel is set up, you will want to load the tigervnc module and start a VNC connection:

$ module load tigervnc
$ vncserver -geometry 1024x768 -alwaysshared -dpi 96 -localhost :1
$ chmod 700 ~/.vnc # only done once

The first time you run the vncserver command you will be prompted to create a password. Make sure to enter a password so that your connection is secure. You will also want to secure your ~/.vnc directory, as shown above.

Now you can connect to localhost:8900 with your VNC client and use the connection.

Once you are finished with the connection:

vncserver -kill :1

Note: If another user is using a vnc connection on the same front end, you may need to increment the screen and forwarded port to one not in use, i.e. :2/5902, :3/5903, etc.

Passwords

If you have received a default password as part of the process of obtaining your account, you should change it before you log onto Rossmann for the first time. Change your password from the SecurePurdue website. You will have the same password on all ITaP systems such as Rossmann, Purdue email, or Blackboard.

Passwords may need to be changed periodically in accordance with Purdue security policies. Passwords must follow certain guidelines as described on the SecurePurdue webpage and ITaP recommends following some guidelines to select a strong password.

ITaP staff will NEVER ask for your password, by email or otherwise.

Never share your password with another user or make your password known to anyone else.

Email

There is no local email delivery available on Rossmann. Rossmann forwards all email which it receives to your career account email address.

Login Shell

Your shell is the program that generates your command-line prompt and processes commands. On ITaP research systems, several common shell choices are available:

Name Description Path
bash A Bourne-shell (sh) compatible shell with many newer advanced features as well. Bash is the default shell for new ITaP research system accounts. This is the most common shell in use on ITaP research systems. /bin/bash
tcsh An advanced variant on csh with all the features of modern shells. Tcsh is the second most popular shell in use today. /bin/tcsh
zsh An advanced shell which incorprates all the functionality of bash and tcsh combined, usually with identical syntax. /bin/zsh

To find out what shell you are running right now, simply use the ps command:

$ ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30273 pts/27   00:00:00 ps

To use a different shell on a one-time or trial basis, simply type the shell name as a command. To return to your original shell, type exit:

$ ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30273 pts/27   00:00:00 ps

$ tcsh
% ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30313 pts/27   00:00:00 tcsh
30315 pts/27   00:00:00 ps

% exit
$

To permanently change your default login shell, use the secure web form provided to change shells.

There is a propagation delay which may last up to two hours before this change will take effect. Once propagated you will need to log out and log back in to start in your new shell.

File Storage and Transfer for Rossmann

Storage Options

File storage options on ITaP research systems include long-term storage (home directories, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. ITaP provides daily snapshots of home directories for a limited time for accidental deletion recovery. ITaP does not back up scratch directories or temporary storage and regularly purges old files from scratch and /tmp directories. More details about each storage option appear below.

Home Directories

ITaP provides home directories for long-term file storage. Each user ID has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

ITaP provides daily snapshots of your home directory for a limited period of time in the event of accidental deletion. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive.

Your home directory physically resides within the Isilon storage system at Purdue. To find the path to your home directory, first log in then immediately enter the following:

$ pwd
/home/myusername

Or from any subdirectory:

$ echo $HOME
/home/myusername

Your home directory and its contents are available on all ITaP research front-end hosts and compute nodes via the Network File System (NFS).

Your home directory has a quota capping the size and/or number of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

Lost Home Directory File Recovery

Only files which have been snap-shotted overnight are recoverable. If you lose a file the same day you created it, it is NOT recoverable.

To recover files lost from your home directory, use the flost command:

$ flost

Scratch Directories

ITaP provides scratch directories for short-term file storage only. The quota of your scratch directory is much greater than the quota of your home directory. You should use your scratch directory for storing temporary input files which your job reads or for writing temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results. The hsi and htar commands provide easy-to-use interfaces into the archive and can be used to copy files into the archive interactively or even automatically at the end of your regular job submission scripts.

Files in scratch directories are not recoverable. ITaP does not back up files in scratch directories. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

ITaP purges files from scratch directories not accessed or had content modified in 90 days. Owners of these files receive a notice one week before removal via email. Be sure to regularly check your Purdue email account or set up mail forwarding to an email account you do regularly check. For more information, please refer to our Scratch File Purging Policy.

All users may access scratch directories on Rossmann. To find the path to your scratch directory:

$ findscratch
/scratch/lustreA/m/myusername

The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

$ echo $RCAC_SCRATCH
/scratch/lustreA/m/myusername

All scratch directories are available on each front-end of all computational resources, however, only the /scratch/lustreA directory is available on Rossmann compute nodes. No other scratch directories are available on Rossmann compute nodes.

To find the path to someone else's scratch directory:

$ findscratch someusername
/scratch/lustreA/s/someusername

Your scratch directory has a quota capping the total size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits .

/tmp Directory

ITaP provides /tmp directories for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

ITaP does not perform backups for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

Long-Term Storage

Long-term Storage or Permanent Storage is available to ITaP research users on the High Performance Storage System (HPSS), an archival storage system, commonly referred to as "Fortress". HPSS is a software package that manages a hierarchical storage system. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has a 1.2 PB capacity.

Files smaller than 100 MB have their primary copy stored on low-cost disks (disk cache), but the second copy (backup of disk cache) is on tape or optical disks. This provides a rapid restore time to the disk cache. However, the large latency to access a larger file (usually involving a copy from a tape cartridge) makes it unsuitable for direct use by any processes or jobs, even where possible. The primary and secondary copies of larger files are stored on separate tape cartridges in the Quantum (ADIC, Advanced Digital Information Corporation) tape library.

To ensure optimal performance for all users, and to keep the Fortress system healthy, please remember the following tips:

  • Fortress operates most effectively with large files - 1GB or larger. If your data is comprised of smaller files, use HTAR to directly create archives in Fortress.
  • When working with files on cluster head nodes, use your home directory or a scratch file system, rather than editing or computing on files directly in Fortress. Copy any data you wish to archive to Fortress after computation is complete.
  • The HPSS software does not handle sparse files (files with empty space) in an optimal manner. Therefore, if you must copy a sparse file into HPSS, use HSI rather than the cp or mv commands.
  • Due to the sparse files issue, the rsync command should not be used to copy data into Fortress through NFS, as this may cause problems with the system.

Fortress writes two copies of every file either to two tapes, or to disk and a tape, to protect against medium errors. Unfortunately, Fortress does not automatically switch to the alternate copy when it has trouble accessing the primary. If it seems to be taking an extraordinary amount of time to retrieve a file (hours), please either email rcac-help@purdue.edu or call ITaP Customer Service at 765-49-4400. We can then investigate why it is taking so long. If it is an error on the primary copy, we will instruct Fortress to switch to the alternate copy as the primary and recreate a new alternate copy.

For more information about Fortress, how it works, user guides, and how to obtain an account:

Manual File Transfer to Long-Term Storage

There are a variety of ways to manually transfer files to your Fortress home directory for long-term storage.

HSI

HSI, the Hierarchical Storage Interface, is the preferred method of transferring files to and from Fortress. HSI is designed to be a friendly interface for users of the High Performance Storage System (HPSS). It provides a familiar Unix-style environment for working within HPSS while automatically taking advantage of high-speed, parallel file transfers without requiring any special user knowledge.

HSI is already provided on all ITaP research systems as the command hsi. You may download HSI for the following platforms as well:

Any machines using HSI or HTAR must have all firewalls (local and departmental) configured to allow open access from the following IP addresses:

  • 128.210.251.141
  • 128.210.251.142
  • 128.210.251.143
  • 128.210.251.144
  • 128.210.251.145

If you are unsure of how to modify your firewall settings, please consult with your department's IT support or the documentation for your operating system. Access to Fortress is restricted to on-campus networks. If you need to directly access Fortress from off-campus, please use the Purdue VPN service before connecting.

Interactive usage:

$ hsi

*************************************************************************
*                    Purdue University 
*                  High Performance Storage System (HPSS)
*************************************************************************
* This is the Purdue Data Archive, Fortress.  For further information 
* see http://www.rcac.purdue.edu/userinfo/resources/fortress/
*  
*   If you are having problems with HPSS, please call IT/Operational
*   Services at 49-44000 or send E-mail to dxul-help@purdue.edu.
*
*************************************************************************
Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011] 

[Fortress HSI]/home/myusername->put data1.fits
put  'test' : '/home/myusername/test' ( 1024000000 bytes, 250138.1 KBS (cos=11))

[Fortress HSI]/home/myusername->lcd /tmp

[Fortress HSI]/home/myusername->get data1.fits
get  '/tmp/data1.fits' : '/home/myusername/data1.fits' (2011/10/04 16:28:50 1024000000 bytes, 325844.9 KBS )

[Fortress HSI]/home/myusername->quit

Batch transfer file:

put data1.fits 
put data2.fits 
put data3.fits 
put data4.fits 
put data5.fits 
put data6.fits 
put data7.fits 
put data8.fits 
put data9.fits

Batch usage:

$ hsi < my_batch_transfer_file
*************************************************************************
*                    Purdue University 
*                  High Performance Storage System (HPSS)
*************************************************************************
* This is the Purdue Data Archive, Fortress.  For further information 
* see http://www.rcac.purdue.edu/userinfo/resources/fortress/
*  
*   If you are having problems with HPSS, please call IT/Operational
*   Services at 49-44000 or send E-mail to dxul-help@purdue.edu.
*
*************************************************************************
Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011] 
put  'data1.fits' : '/home/myusername/data1.fits' ( 1024000000 bytes, 250200.7 KBS (cos=11))
put  'data2.fits' : '/home/myusername/data2.fits' ( 1024000000 bytes, 258893.4 KBS (cos=11))
put  'data3.fits' : '/home/myusername/data3.fits' ( 1024000000 bytes, 222819.7 KBS (cos=11))
put  'data4.fits' : '/home/myusername/data4.fits' ( 1024000000 bytes, 224311.9 KBS (cos=11))
put  'data5.fits' : '/home/myusername/data5.fits' ( 1024000000 bytes, 323707.3 KBS (cos=11))
put  'data6.fits' : '/home/myusername/data6.fits' ( 1024000000 bytes, 320322.9 KBS (cos=11))
put  'data7.fits' : '/home/myusername/data7.fits' ( 1024000000 bytes, 253192.6 KBS (cos=11))
put  'data8.fits' : '/home/myusername/data8.fits' ( 1024000000 bytes, 253056.2 KBS (cos=11))
put  'data9.fits' : '/home/myusername/data9.fits' ( 1024000000 bytes, 323218.9 KBS (cos=11))
EOF detected on TTY - ending HSI session

For more information about HSI:

HTAR

HTAR (short for "HPSS TAR") is a utility program that writes TAR-compatible archive files directly onto Fortress, without having to first create a local file. Its command line was originally based on the AIX tar program, with a number of extensions added to provide extra features.

HTAR is already provided on all ITaP research systems as the command htar. You may download HTAR for the following platforms as well:

Any machines using HSI or HTAR must have all firewalls (local and departmental) configured to allow open access from the following IP addresses:

  • 128.210.251.141
  • 128.210.251.142
  • 128.210.251.143
  • 128.210.251.144
  • 128.210.251.145

If you are unsure of how to modify your firewall settings, please consult with your department's IT support or the documentation for your operating system. Access to Fortress is restricted to on-campus networks. If you need to directly access Fortress from off-campus, please use the Purdue VPN service before connecting.

Usage:

  (Create a tar archive on Fortress named data.tar including all files with the extension ".fits".)
$ htar -cvf data.tar *.fits
HTAR: a   data1.fits                                      
HTAR: a   data2.fits
HTAR: a   data3.fits
HTAR: a   data4.fits
HTAR: a   data5.fits
HTAR: a   data6.fits
HTAR: a   data7.fits
HTAR: a   data8.fits
HTAR: a   data9.fits
HTAR: a   /tmp/HTAR_CF_CHK_17953_1317760775
HTAR Create complete for data.tar. 9,216,006,144 bytes written for 9 member files, max threads: 3 Transfer time: 29.622 seconds (311.121 MB/s)
HTAR: HTAR SUCCESSFUL   

  (Unpack a tar archive on Fortress named data.tar into a scratch directory for use in a batch job.)
$ cd $RCAC_SCRATCH/job_dir
$ htar -xvf data.tar 
HTAR: x data1.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data2.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data3.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data4.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data5.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data6.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data8.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data9.fits, 1024000000 bytes, 2000001 media blocks
HTAR: Extract complete for data.tar, 9 files. total bytes read: 9,216,004,608 in 33.914 seconds (271.749 MB/s )
HTAR: HTAR SUCCESSFUL

  (Look at the contents of the data.tar HTAR archive on Fortress.)
$ htar -tvf data.tar
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:30  data1.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data2.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data3.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data4.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data5.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data6.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data7.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data8.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data9.fits
HTAR: -rw-------  myusername/pucc        256 2011-10-04 16:39  /tmp/HTAR_CF_CHK_17953_1317760775
HTAR: Listing complete for data.tar, 10 files 10 total objects
HTAR: HTAR SUCCESSFUL

  (Unpack a single file, "data7.fits", from the tar archive on Fortress named data.tar into a scratch directory.)
$ htar -xvf data.tar data7.fits
HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
HTAR: Extract complete for data.tar, 1 files. total bytes read: 1,024,000,512 in 3.642 seconds (281.166 MB/s )
HTAR: HTAR SUCCESSFUL

For more information about HTAR:

SCP

Fortress does NOT support SCP.

SFTP

Fortress does NOT support SFTP.

Environment Variables

Several environment variables are automatically defined for you to help you manage your storage. Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:

Name Description
HOME path to your home directory
PWD path to your current directory
RCAC_SCRATCH path to scratch filesystem

By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

$ ls $HOME
...

$ ls $RCAC_SCRATCH/myproject
...

To find the value of any environment variable:

$ echo $RCAC_SCRATCH
/scratch/lustreA/m/myusername

To list the values of all environment variables:

$ env
USER=myusername
HOME=/home/myusername
RCAC_SCRATCH=/scratch/lustreA/m/myusername
...

You may create or overwrite an environment variable. To pass (export) the value of a variable in either bash or ksh:

$ export MYPROJECT=$RCAC_SCRATCH/myproject

To assign a value to an environment variable in either tcsh or csh:

$ setenv MYPROJECT value

Storage Quotas / Limits

ITaP imposes some limits on your disk usage on research systems. ITaP implements a quota on each filesystem. Each filesystem (home directory, scratch directory, etc.) may have a different limit. If you exceed the quota, you will not be able to save new files or new data to the filesystem until you delete or move data to long-term storage.

Checking Quota Usage

To check the current quotas of your home and scratch directories use the myquota command:

$ myquota
Type        Filesystem          Size    Limit  Use         Files    Limit  Use
==============================================================================
home        extensible         5.0GB   10.0GB  50%             -        -   - 
scratch     /scratch/lustreA/    8KB  476.8GB   0%             2  100,000   0%

The columns are as follows:

  1. Type: indicates home or scratch directory.
  2. Filesystem: name of storage option.
  3. Size: sum of file sizes in bytes.
  4. Limit: allowed maximum on sum of file sizes in bytes.
  5. Use: percentage of file-size limit currently in use.
  6. Files: number of files and directories (not the size).
  7. Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
  8. Use: percentage of file-number limit currently in use.

If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

$ du -h --max-depth=1 $HOME >myfile
32K /home/myusername/mysubdirectory_1
529M    /home/myusername/mysubdirectory_2
608K    /home/myusername/mysubdirectory_3

The second directory is the largest of the three, so apply command du to it.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

$ du -h --max-depth=1 $RCAC_SCRATCH >myfile
160K    /scratch/lustreA/m/myusername

This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to long-term storage to free space in your home and scratch directories.

Increasing Your Storage Quota

Home Directory

If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may go to the BoilerBackpack Quota Management site and use the sliders there to increase the amount of space allocated to your research home directory vs. other storage options, up to a maximum of 100GB.

Scratch Directory

If you find you need additional disk space in your scratch directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may ask for a quota increase at rcac-help@purdue.edu. Quota requests up to 2TB and 200,000 files on LustreA or LustreC can be processed quickly.

Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. The mostly commonly used options are:

  • tar   (more information)
    Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.
    Examples:
      (list contents of archive somefile.tar)
    $ tar tvf somefile.tar
    
      (extract contents of somefile.tar)
    $ tar xvf somefile.tar
    
      (extract contents of gzipped archive somefile.tar.gz)
    $ tar xzvf somefile.tar.gz
    
      (extract contents of bzip2 archive somefile.tar.bz2)
    $ tar xjvf somefile.tar.bz2
    
      (archive all ".c" files in current directory into one archive file)
    $ tar cvf somefile.tar *.c 
    
      (archive and gzip-compress all files in a directory into one archive file)
    $ tar czvf somefile.tar.gz somedirectory/
    
      (archive and bzip2-compress all files in a directory into one archive file)
    $ tar cjvf somefile.tar.bz2 somedirectory/
    
    
    Other arguments for tar can be explored by using the man tar command.
  • gzip   (more information)
    The standard compression system for all GNU software.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ gzip somefile
    
      (uncompress file somefile.gz - also removes compressed file)
    $ gunzip somefile.gz
    
  • bzip2   (more information)
    Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ bzip2 somefile
    
      (uncompress file somefile.bz2 - also removes compressed file)
    $ bunzip2 somefile.bz2
    

There are several other, less commonly used, options available as well:

  • zip
  • 7zip
  • xz

File Transfer

There are a variety of ways to transfer data to and from ITaP research systems. Which you should use depends on several factors, including the ease of use for you personally, connection speed and bandwidth, and the size and number of files which you intend to transfer.

FTP

ITaP does not support FTP on any ITaP research systems because it does not allow for secure transmission of data. Try using one of the other methods described below instead of FTP.

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage:

  (to a remote system from local)
$ scp sourcefilename myusername@hostname:somedirectory/destinationfilename

  (from a remote system to local)
$ scp myusername@hostname:somedirectory/sourcefilename destinationfilename

  (recursive directory copy to a remote system from local)
$ scp -r sourcedirectory/ myusername@hostname:somedirectory/

Linux / Solaris / AIX / HP-UX / Unix:

  • You should have already installed the "scp" command-line program.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SCP and SFTP client.
  • PuTTY also offers "pscp.exe", which is an extremely small program and a basic SCP client.
  • Secure FX is a commercial SCP and SFTP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".

SFTP

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

Command-line usage:

$ sftp -B buffersize myusername@hostname

      (to a remote system from local)
sftp> put sourcefile somedir/destinationfile
sftp> put -P sourcefile somedir/

      (from a remote system to local)
sftp> get sourcefile somedir/destinationfile
sftp> get -P sourcefile somedir/

sftp> exit
  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Linux / Solaris / AIX / HP-UX / Unix:

  • The "sftp" command line program should already be installed.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers "psftp.exe", which is an extremely small program and a basic SFTP client.
  • Secure FX is a commercial SFTP and SCP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • MacSFTP is a free graphical SFTP client for Macs.

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It works within ITaP's various research storage systems; it connects between ITaP and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Globus Web:

  • Navigate to globus.org
  • Click "Log In" in the upper right.
  • In the upper right of the Username box, choose "alternate login" and select "InCommon / CILogon"
  • On the identity screen select "Purdue University Main Campus" then login with your Purdue account.
  • On your first login it will ask to make a connection to a Globus account. If you have one sign in, otherwise click the link to create a new account.
  • Now you're at the main screen. Click "Start Transfer" which will bring you to a two endpoint interface.
  • Purdue's endpoint is named "purdue#rcac", however, you can start typing "purdue" and it will autocomplete.
  • The paths to research storage are the same as they are when you're logged into the clusters, but are provided below for reference.
    • Home directory: /~/
    • Scratch directory: /scratch/lustreA/m/myusername where m is the first letter of your username and myusername is your career account name.
    • Group directory: /group/mygroupname where mygroupname is the name of your group.
    • Fortress long-term storage: /archive/fortress/home/myusername where myusername is your career account name.

  • For the second endpoint, you can choose any other Globus endpoint, such as another research site, or a Globus Personal endpoint, which will allow you to transfer to a personal workstation or laptop.

Globus Personal Client setup:

  • On the endpoint page from earlier, click "Get Globus Connect Personal" or download it from here: Globus Connect Personal
  • Name this particular personal system and click "Generate Setup Key" on this page: Create Gloubs Personal endpoint
  • Copy the key and paste it into the setup box when installing the client for your system.
  • Your personal system is now available as an endpoint within the Globus transfer interface.

Globus Command Line:

For more information, please see Globus Support.

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Windows:

  • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
  • Windows 8.1: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
  • Windows XP: Click Start > My Computer, then click Tools > Map Network Drive
  • In the folder location enter the following information and click Finish:

    • To access your home directory, enter \\samba.rcac.purdue.edu\myusername where myusername is your career account name.
    • To access your scratch storage on Rossmann, enter \\samba.rcac.purdue.edu\scratch. Once mapped, you will be able to navigate to rossmann\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this drive mapping.
    • To access Fortress long-term storage, enter \\fortress-smb.rcac.purdue.edu\myusername where myusername is your career account name.

  • You may be prompted for login information. Enter your username as onepurdue\myusername and your account password. If you forget the onepurdue prefix it will prevent you from logging in.
  • Your home, scratch, or Fortress directory should now be mounted as a drive in the Computer window.

Mac OS X:

  • In the Finder, click Go > Connect to Server
  • In the Server Address enter the following information and click Connect:

    • To access your home directory, enter smb://samba.rcac.purdue.edu/myusername where myusername is your career account name.
    • To access your scratch storage on Rossmann, enter smb://samba.rcac.purdue.edu\scratch. Once connected, you will be able to navigate to rossmann\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this mount.
    • To access Fortress long-term storage, enter smb://fortress-smb.rcac.purdue.edu/myusername where myusername is your career account name.

  • You may be prompted for login information. Enter your username, password and for the domain enter onepurdue or it will prevent you from logging in.

Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you would like access via samba on the command line you may install smbclient which will give you ftp-like access and can be used as shown below. SCP or SFTP is recommended over this use case. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //samba.rcac.purdue.edu/myusername -U myusername -W onepurdue

Applications on Rossmann

Provided Applications

The following table lists the third-party software which ITaP has installed on its research systems. Additional software may be available. To see the software on a specific system, run the command module avail on that system. Please contact rcac-help@purdue.edu if you are interested in the availability of software not shown in this list.

Software Radon Steele Coates, Rossmann, Hansen & Carter Peregrine 1
Abaqus ¹
AcGrace
Amber ¹
Ann
ANSYS ¹
ATK
Antelope
Auto3Dem
ATLAS
BinUtils
BLAST
Boost
Cairo
CDAT
CGNSLib
Cmake
COMSOL ²
CPLEX ¹
DX
Eman
Eman2
Ferret
FFMPEG
FFTW
FLUENT ¹
GAMESS
GAMS
Gaussian ¹
GCC (Compilers)
GDAL
GemPak
Git
GLib
GMP
GMT
GrADS
GROMACS
GS
GSL
GTK+
GTKGlarea
Guile
HarminV
HDF4
HDF5
Hy3S
ImageMagick
IMSL ¹
Intel Compilers ¹
Jackal ²
Jasper
Java
LAMMPS
LibCTL
LibPNG
LibTool
LoopyMod ²
Maple ¹
Mathematica ¹
MATLAB ¹
Meep
MoPac
MPB
MPFR
MPICH
MPICH2
MPIExec
MrBayes
MUMPS
MVAPICH2
NAMD
NCL
NCO
NCView
NetCDF
NETPBM
NWChem
Octave
OpenMPI
Pango
Petsc
PGI Compilers ¹
Phrap
Pixman
PKG-Config
Proj
Python
QTLC
Rational
R
SAC
SAS ¹
ScaLAPACK
Seismic
Subversion
SWFTools
Swig
SysTools
Tao
TecPlot ²
TotalView ¹
UDUNITS
Valgrind
VMD
Weka

¹ Only users on Purdue's West Lafayette campus may use this software.
² Only specific research groups may use this software.

Please contact rcac-help@purdue.edu for specific questions about software license restrictions on ITaP research systems.

Environment Management with the Module Command

ITaP uses the module command as the preferred method to manage your processing environment. With this command, you may load applications and compilers along with their libraries and paths. Modules are packages which you load and unload as needed.

Please use the module command and do not manually configure your environment, as ITaP staff will frequently make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.

To view a brief usage report:

$ module

Below follows a short introduction to the module command. You can see more in the man page for module.

List Available Modules

To see what modules are available on this system:

$ module avail

To see which versions of a specific compiler are available on this system:

$ module avail gcc
$ module avail intel
$ module avail pgi

To see available modules for MPI libraries:

 $ module avail openmpi 
 $ module avail mvapich2 
 $ module avail impi 
 $ module avail mpich2 

To see available modules for specific provided applications, use names from the list obtained with the command module avail:

$ module avail abaqus
$ module avail matlab
$ module avail mathematica

Load / Unload a Module

All modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or you may specify which version you wish to load.

For each cluster, ITaP makes a recommendation regarding the set of compiler, math library, and message-passing library for parallel code. To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

To load the default version of a specific compiler, choose one of the following commands:

$ module load gcc
$ module load intel
$ module load pgi

To load a specific version of the recommended compiler, include the version number:

$ module load pgi/11.8-0

When running a job, you must use the job submission file to load on the compute node(s) any relevant modules. Loading modules on the front end before submitting your job is sufficient when using the front end during the development phase of your application but not sufficient when using the compute node(s) during the production phase. You must load the same modules on the compute node(s).

To unload a module, enter the same module name used to load that module. Unloading will attempt to undo the environmental changes which a previous load command installed.

To unload the default version of a specific compiler:

$ module unload gcc
$ module unload intel
$ module unload pgi

To unload a specific version of the recommended compiler, include the same version number used to load that Intel compiler:

$ module unload pgi/11.8-0

Apply the same methods to manage the modules of provided applications:

$ module load matlab
$ module unload matlab

To unload all currently loaded modules:

module purge

List Currently Loaded Modules

To see currently loaded modules:

$ module list
Currently Loaded Modulefiles:
  1) intel/12.1

To unload a module:

$ module unload intel
$ module list
No Modulefiles Currently Loaded.

Show Module Details

To learn more about what a module does to your environment, you may use the module show module_name command, where module_name is any name in the list from command module avail. This can be either a default name like "intel", "gcc", "pgi", and "matlab", or a specific version of a module, such as "intel/11.1.072". Here is an example showing what loading the default Intel compiler does to the processing environment:

$ module show intel
-------------------------------------------------------------------
/opt/modules/modulefiles/intel/12.1:

module-whatis    invoke Intel 12.1.0 Compilers (64-bit) 
prepend-path     PATH /opt/intel/composer_xe_2011_sp1.6.233/bin/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
prepend-path     LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 
prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/%l_%t/%N 
prepend-path     CPATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/include 
setenv           CC icc 
setenv           CXX icpc 
setenv           FC ifort 
setenv           ICC_HOME /opt/intel/composer_xe_2011_sp1.6.233 
setenv           IFORT_HOME /opt/intel/composer_xe_2011_sp1.6.233 
setenv           MKL_HOME /opt/intel/composer_xe_2011_sp1.8.273/mkl 
setenv           TBBROOT /opt/intel/composer_xe_2011_sp1.6.233/tbb 
setenv           LAPACK_INCLUDE -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include 
setenv           LAPACK_INCLUDE_F95 -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include/intel64/lp64 
setenv           LINK_LAPACK -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK95 -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK95_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
-------------------------------------------------------------------

To show what loading a specific Intel compiler version does to the processing environment:

$ module show intel/11.1.072
-------------------------------------------------------------------
/opt/modules/modulefiles/intel/11.1.072:

module-whatis    invoke Intel 11.1.072 64-bit Compilers 
prepend-path     PATH /opt/intel/Compiler/11.1/072/bin/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/mkl/10.2.5.035/lib/em64t 
prepend-path     LD_LIBRARY_PATH /opt/intel/Compiler/11.1/072/lib/intel64 
prepend-path     NLSPATH /opt/intel/mkl/10.2.5.035/lib/em64t/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/idb/intel64/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/lib/intel64/locale/%l_%t/%N 
setenv           CC icc 
setenv           CXX icpc 
setenv           FC ifort 
setenv           F90 ifort 
setenv           ICC_HOME /opt/intel/Compiler/11.1/072 
setenv           IFORT_HOME /opt/intel/Compiler/11.1/072 
setenv           MKL_HOME /opt/intel/mkl/10.2.5.035 
setenv           LAPACK_INCLUDE -I/opt/intel/mkl/10.2.5.035/include 
setenv           LAPACK_INCLUDE_F95 -I/opt/intel/mkl/10.2.5.035/include/em64t/lp64 
setenv           LINK_LAPACK -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t 
setenv           LINK_LAPACK_STATIC -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread 
setenv           LINK_LAPACK95 -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t 
setenv           LINK_LAPACK95_STATIC -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread 
-------------------------------------------------------------------

Compiling Source Code on Rossmann

Provided Compilers

Compilers are available on Rossmann for Fortran 77, Fortran 90, Fortran 95, C, and C++. The compilers can produce general-purpose and architecture-specific optimizations to improve performance. These include loop-level optimizations, inter-procedural analysis and cache optimizations. The compilers support automatic and user-directed parallelization of Fortran, C, and C++ applications for multiprocessing execution. More detailed documentation on each compiler set available on Rossmann follows.

On Rossmann, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • PGI 11.8-0
  • ACML
  • OpenMPI 1.6.3

To load the recommended set:

$ module load devel
$ module list

Intel Compiler Set

One or more versions of the Intel compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail intel

Choose an appropriate Intel module and load it. For example:

$ module load intel

Here are some examples for the Intel compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ ifort myprogram.f -o myprogram
$ mpiifort myprogram.f -o myprogram
$ ifort -openmp myprogram.f -o myprogram
Fortran90
$ ifort myprogram.f90 -o myprogram
$ mpiifort myprogram.f90 -o myprogram
$ ifort -openmp myprogram.f90 -o myprogram
Fortran95 (same as Fortran 90) (same as Fortran 90) (same as Fortran 90)
C
$ icc myprogram.c -o myprogram
$ mpiicc myprogram.c -o myprogram
$ icc -openmp myprogram.c -o myprogram
C++
$ icpc myprogram.cpp -o myprogram
$ mpiicpc myprogram.cpp -o myprogram
$ icpc -openmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the Intel compilers:

GNU Compiler Set

The official name of the GNU compilers is "GNU Compiler Collection" or "GCC". One or more versions of the GNU compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail gcc

Choose an appropriate GCC module and load it. For example:

$ module load gcc

An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load a newer version using the command module load gcc.

Here are some examples for the GNU compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ gfortran myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
Fortran90
$ gfortran myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
Fortran95
$ gfortran myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
C
$ gcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
C++
$ g++ myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the GCC compilers:

PGI Compiler Set

One or more versions of the PGI compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail pgi

Choose an appropriate PGI module and load it. For example:

$ module load pgi

Here are some examples for the PGI compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ pgf77 myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran90
$ pgf90 myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran95
$ pgf95 myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ pgcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ pgCC myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

More information on compiler options can be found in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the PGI compilers:

Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one computer. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your serial program:

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort myprogram.f -o myprogram
$ gfortran myprogram.f -o myprogram
$ pgf77 myprogram.f -o myprogram
Fortran 90
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f90 -o myprogram
$ pgf90 myprogram.f90 -o myprogram
Fortran 95
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f95 -o myprogram
$ pgf95 myprogram.f95 -o myprogram
C
$ icc myprogram.c -o myprogram
$ gcc myprogram.c -o myprogram
$ pgcc myprogram.c -o myprogram
C++
$ icc myprogram.cpp -o myprogram
$ g++ myprogram.cpp -o myprogram
$ pgCC myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Compiling MPI Programs

A message-passing program is a set of processes (often multiple copies of a single process) that take advantage of distributed-memory systems by communicating with each other via the sending and receiving of messages. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. OpenMPI, MPICH2, MVAPICH2, and Intel MPI (IMPI) are implementations of the MPI standard. Libraries for these MPI implementations and compilers for C, C++, and versions of Fortran are available.

MPI programs require including a header file:

Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
INCLUDE 'mpif.h'
Fortran 95
INCLUDE 'mpif.h'
C
#include <mpi.h>
C++
#include <mpi.h>

Here are a few sample programs using MPI:

To see the available MPI libraries:

 $ module avail openmpi 
 $ module avail mvapich2 
 $ module avail impi 
 $ module avail mpich2 

The following table illustrates how to compile your message-passing program. Any compiler flags accepted by ifort/icc compilers are compatible with mpif77/mpicc.

Language Intel MPI OpenMPI, MPICH2, or MVAPICH2
Fortran 77
$ mpiifort program.f -o program
$ mpif77 program.f -o program
Fortran 90
$ mpiifort program.f90 -o program
$ mpif90 program.f90 -o program
Fortran 95
$ mpiifort program.f95 -o program
$ mpif90 program.f95 -o program
C
$ mpiicc program.c -o program
$ mpicc program.c -o program
C++
$ mpiicpc program.C -o program
$ mpiCC program.C -o program

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on the MPI libraries:

Compiling OpenMP Programs

A shared-memory program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables. It distributes the work of a process over several cores of a multi-core processor. Compilers which include OpenMP are available for C, C++, and versions of Fortran.

OpenMP programs require including a header file:

Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
Fortran 90
use omp_lib
Fortran 95
use omp_lib
C
#include <omp.h>
C++
#include <omp.h>

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort -openmp myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran 90
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran 95
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ icc -openmp myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ icc -openmp myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on OpenMP:

Compiling Hybrid Programs

A hybrid program combines both message-passing and shared-memory attributes to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI, MPICH2, MVAPICH2, and Intel MPI (IMPI) and compilers which include OpenMP for C, C++, and versions of Fortran are available.

Hybrid programs require including header files:

Language Header Files
Fortran 77
INCLUDE 'omp_lib.h'
INCLUDE 'mpif.h'
Fortran 90
use omp_lib
INCLUDE 'mpif.h'
Fortran 95
use omp_lib
INCLUDE 'mpif.h'
C
#include <mpi.h>
#include <omp.h>
C++
#include <mpi.h>
#include <omp.h>

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

To see the available MPI libraries:

 $ module avail openmpi 
 $ module avail mvapich2 
 $ module avail impi 
 $ module avail mpich2 

The following table illustrates how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by ifort/icc compilers are compatible with mpif77/mpicc and OpenMP.

Language Intel MPI OpenMPI, MPICH2, or MVAPICH2 with Intel Compiler
Fortran 77
$ mpiifort -openmp myprogram.f -o myprogram
$ mpif77 -openmp myprogram.f -o myprogram
Fortran 90
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
Fortran 95
$ mpiifort -openmp myprogram.f90 -o myprogram
$ mpif90 -openmp myprogram.f90 -o myprogram
C
$ mpiicc -openmp myprogram.c -o myprogram
$ mpicc -openmp myprogram.c -o myprogram
C++
$ mpiicpc -openmp myprogram.C -o myprogram
$ mpiCC -openmp myprogram.C -o myprogram
Language OpenMPI, MPICH2, or MVAPICH2 with GNU Compiler OpenMPI, MPICH2, or MVAPICH2 with PGI Compiler
Fortran 77
$ mpif77 -fopenmp myprogram.f -o myprogram
$ mpif77 -mp myprogram.f -o myprogram
Fortran 90
$ mpif90 -fopenmp myprogram.f90 -o myprogram
$ mpif90 -mp myprogram.f90 -o myprogram
Fortran 95
$ mpif90 -fopenmp myprogram.f95 -o myprogram
$ mpif90 -mp myprogram.f95 -o myprogram
C
$ mpicc -fopenmp myprogram.c -o myprogram
$ mpicc -mp myprogram.c -o myprogram
C++
$ mpiCC -fopenmp myprogram.C -o myprogram
$ mpiCC -mp myprogram.C -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Provided Libraries

Some mathematical libraries are available on Rossmann. More detailed documentation about the libraries available on Rossmann follows.

Intel Math Kernel Library (MKL)

Intel Math Kernel Library (MKL) contains ScaLAPACK, LAPACK, Sparse Solver, BLAS, Sparse BLAS, CBLAS, GMP, FFTs, DFTs, VSL, VML, and Interval Arithmetic routines. MKL resides in the directory stored in the environment variable MKL_HOME, after loading a version of the Intel compiler with module.

By using module load to activate an Intel compiler your shell environment will have several variables set up to help link applications with MKL. Here are some example combinations of simplified linking options:

$ module load intel
$ echo $LINK_LAPACK
-L${MKL_HOME}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread 

$ echo $LINK_LAPACK95 
-L${MKL_HOME}/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread 

ITaP recommends you use the provided variables to define MKL linking options in your compiling procedures. The Intel compiler modules also provide two other environment variables, LINK_LAPACK_STATIC and LINK_LAPACK95_STATIC that you may use if you need to link MKL statically.

ITaP recommends that you use dynamic linking of libguide. If so, define LD_LIBRARY_PATH such that you are using the correct version of libguide at run time. If you use static linking of libguide (discouraged), then:

  • If you use the Intel compilers, link in the libguide version that comes with the compiler (use the -openmp option).
  • If you do not use the Intel compilers, link in the libguide version that comes with the Intel MKL above.

Here are some more documentation from other sources on the Intel MKL:

Mixing Fortran, C, and C++ Code on Unix

You may write different parts of a computing application in different programming languages. For example, an application might incorporate older, legacy code which performs numerical calculations written in Fortran. Systems functions might use C. A newer, main program which binds together all older code might use C++ to take advantage of the object orientation. This section illustrates a few simple examples.

For more information about mixing programming languages:

Using cpp with Fortran

If the source file ends with .F, .fpp, or .FPP, cpp automatically preprocesses the source code before compilation. If you want to use the C preprocessor with source files that do not end with .F, use the following compiler option to specify the filename suffix:

  • GNU Compilers: -x f77-cpp-input
    Note that preprocessing does not extend to the contents of files included by an "INCLUDE" directive. You must use the #include preprocessor directive instead.
    For example, to preprocess source files that end with .f:
    $ gfortran -x f77-cpp-input myprogram.f
    
  • Intel Compilers: -cpp
    To tell the compiler to link using C++ runtime libraries included with gcc/icc:
    $ ... -cxxlib -gcc/-cxxlib -icc
    
    For example, to preprocess source files that end with .f:
    $ ifort -cpp myprogram.f
    

Generally, it is advisable to rename your file from myprogram.f to myprogram.F. The preprocessor then automatically runs when you compile the file.

For more information on combining C/C++ and Fortran:

C Program Calling Subroutines in Fortran, C, and C++

A C language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C program calls the Fortran routine with the underscore character.

Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C++ routine to a C program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ while the C program again specifies a pointer (ampersand "&") in the call to the C++ routine.

The C++ compiler must know at the time of compiling the C++ routine that the C program will invoke the C++ routine with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
C Main Program
$ module load intel
$ icc -c main.c
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ icc -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ gcc -c main.c
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ gcc -lstdc++ main.o f90.o c.o cpp.o
$ module load pgi
$ pgcc -c main.c
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgf90 -Mnomain main.o c.o cpp.o f90.f90

The results show that each routine successfully returns a different character to the main program:

$ a.out
main(), initial value:               chr=X
main(), after function subr_f_():    chr=f
main(), after function func_c():     chr=c
main(), after function func_cpp():   chr=+
Exit main.c

C++ Program Calling Subroutines in Fortran, C, and C++

A C++ language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C++ program calls the Fortran routine with the underscore character.

Fortran uses pass-by-reference while C++ uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C++ program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C routine to a C++ program, the C routine must declare a parameter as a pointer (asterisk "*") while the C++ program again specifies a pointer (ampersand "&") in the call to the C routine.

The C++ compiler must know at the time of compiling the C++ program that the C++ program will invoke the Fortran and C routines with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
C++ Main Program
$ module load intel
$ icc -c main.cpp
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ icc -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ g++ -c main.cpp
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ g++ main.o f90.o c.o cpp.o
$ module load pgi
$ pgCC -c main.cpp
$ pgf90 -c f90.f90
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgCC -L../lib main.o c.o cpp.o f90.o -pgf90libs

The results show that each routine successfully returns a different character to the main program:

$ a.out
main(), initial value:               chr=X
main(), after function subr_f_():    chr=f
main(), after function func_c():     chr=c
main(), after function func_cpp():   chr=+
Exit main.cpp

Fortran Program Calling Subroutines in Fortran, C, and C++

A Fortran language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine, so the definitions of the C and C++ routines must include the underscore. The Fortran program calls these routines without the underscore character in the Fortran source code.

Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a C routine to a Fortran program requires the parameter of the C routine to be a pointer (asterisk "*") in the C routine's definition. To pass a value from a C++ routine to a Fortran program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ in its definition.

The C++ compiler must know at the time of compiling the C++ routine that the Fortran program will invoke the C++ routine with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
Fortran 90 Main Program
$ module load intel
$ ifort -c main.f90
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ ifort -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ gfortran -c main.f90
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ gfortran -lstdc++ main.o c.o cpp.o f90.o
$ module load pgi
$ pgf90 -c main.f90
$ pgf90 -c f90.f90
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgf90 main.o c.o cpp.o f90.o

The results show that each routine successfully returns a different character to the main program:

$ a.out
 main(), initial value:               chr=X
 main(), after function subr_f():     chr=f
 main(), after function subr_c():     chr=c
 main(), after function func_cpp():   chr=+
 Exit mixlang

Running Jobs on Rossmann

There are two methods for submitting jobs to the Rossmann community cluster. First, you may use the portable batch system (PBS) to submit jobs directly to a queue on Rossmann. PBS performs job scheduling. Jobs may be serial, message-passing, shared-memory, or hybrid (message-passing + shared-memory) programs. You may use either the batch or interactive mode to run your jobs. Use the batch mode for finished programs; use the interactive mode only for debugging. Secondly, since the Rossmann cluster is a part of BoilerGrid, you may submit serial jobs to BoilerGrid and specifically request compute nodes on Rossmann.

Running Jobs via PBS

The Portable Batch System (PBS) is a richly featured workload management system providing job scheduling and job management interface on computing resources, including Linux clusters. With PBS, a user requests resources and submits a job to a queue. The system will then take jobs from queues, allocate the necessary nodes, and execute them in as efficient a manner as it can.

Do NOT run large, long, multi-threaded, parallel, or CPU-intensive jobs on a front-end login host. All users share the front-end hosts, and running anything but the smallest test job will negatively impact everyone's ability to use Rossmann. Always use PBS to submit your work as a job. You may even submit interactive sessions as jobs. This section of documentation will explain how to use PBS.

Tips

  • Remember that ppn can not be larger than the number of processor cores on each node.
  • If you compiled your own code, you must module load that same compiler from your job submission file. However, it is not necessary to load the standard compiler module if you load the corresponding compiler module with parallel libraries included.
  • To see a list of the nodes which ran your job: cat $PBS_NODEFILE
  • The order of processor cores is random. There is no way to tell which processor will do what or in which order in a parallel program.
  • If you use the tcsh and csh shells and if a .logout file exists in your home directory, the exit status of your jobs will be that of the .logout script, not the job submission file. This may impact any interjob dependencies. To preserve the job exit status, remove the .logout file.

Queues

Rossmann, as a community cluster, has one or more queues dedicated to each partner who has purchased access to the cluster. These queues provide partners with priority access to their portion of the cluster. Additionally, community clusters provide a "standby" queue which is available to all cluster users. This "standby" queue allows users to utilize portions of the cluster that would otherwise be idle, but at a lower priority than partner-queue jobs, and with a relatively short time limit, to ensure "standby" jobs will not be able to tie up resources and prevent partner-queue jobs from running quickly.

To see a list of all queues on Rossmann that you may submit to, use the qlist command:

$ qlist 

                          Current Number of Cores
Queue                 Total     Queue   Run     Free         Max Walltime
===============    ====================================     ==============
myqueue                  24	48	12	12		720:00:00
standby               9,584	7,384	4,678	98		4:00:00

This lists each queue you can submit to, the number of cores allocated to the queue, the total number of cores queued in jobs waiting to run, how many cores are in use, and how many are available to run jobs. The maximum walltime you may request is also listed. This command can be used to get a general idea of how busy a queue is and how long you may have to wait for your job to start.

Job Submission File

To submit work to a PBS queue, you must first create a job submission file. This job submission file is essentially a simple shell script. It will set any required environment variables, load any necessary modules, create or modify files and directories in your scratch space, and invoke any applications that you need. However, a job submission file can be as simple as the path to your application:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Print the hostname of the compute node on which this job is running.
/bin/hostname

Or, as simple as listing the names of compute nodes assigned to your job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

PBS sets several potentially useful environment variables which you may use within your job submission files. Here is a list of some:

Name Description
PBS_O_WORKDIR Absolute path of the current working directory when you submitted this job
PBS_JOBID Job ID number assigned to this job by the batch system
PBS_JOBNAME Job name supplied by the user
PBS_NODEFILE File containing the list of nodes assigned to this job
PBS_O_HOST Hostname of the system where you submitted this job
PBS_O_QUEUE Name of the original queue to which you submitted this job
PBS_O_SYSTEM Operating system name given by uname -s where you submitted this job
PBS_ENVIRONMENT "PBS_BATCH" if this job is a batch job, or "PBS_INTERACTIVE" if this job is an interactive job

Here is an example of a commonly used PBS variable, making sure a job runs from within the same directory that you submitted it from:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Change to the directory from which you originally submitted this job.
cd $PBS_O_WORKDIR

# Print out the current working directory path.
pwd

You may also find the need to load a module to run a job on a compute node. Loading a module on a front end does NOT automatically load that module on the compute node where a job runs. You must use the job submission file to load a module on the compute node:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Load the module for NetPBM.
module load netpbm

# Convert a PostScript file to GIF format using NetPBM tools.
pstopnm myfilename.ps | ppmtogif > myfilename.gif

Job Submission

Once you have a job submission file, you may submit this script to PBS using the qsub command. PBS will find an available processor core or a set of processor cores and run your job there, or leave your job in a queue until some become available. At submission time, you may also optionally specify many other attributes or job requirements you have regarding where your jobs will run.

To submit your serial job to one processor core on one compute node with no special requirements:

$ qsub myjobsubmissionfile

To submit your job to a specific queue:

$ qsub -q myqueuename myjobsubmissionfile

By default, each job receives 30 minutes of wall time for its execution. The wall time is the total time in real clock time (not CPU cycles) that you believe your job will need to run to completion. If you know that your job will not need more than a certain amount of time to run, it is very much to your advantage to request less than the maximum allowable wall time, as this may allow your job to schedule and run sooner. To request the specific wall time of 1 hour and 30 minutes:

$ qsub -l walltime=01:30:00 myjobsubmissionfile

To submit your job with your currently-set environment variables:

$ qsub -V myjobsubmissionfile

The nodes resource indicates how many compute nodes you would like reserved for your job. The node property ppn specifies how many processor cores you need on each compute node. Each compute node in Rossmann has 24 processor cores. Detailed explanations regarding the distribution of your job across different compute nodes for parallel programs appear in the sections covering specific parallel programming libraries.

To request 2 compute nodes with 4 processor cores per node:

$ qsub -l nodes=2:ppn=4 myjobsubmissionfile

Here is a typical list of compute node names from a qsub command requesting 2 compute nodes and 4 processor cores:

rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638

Note that if you request more than ppn=24 on Rossmann, your job will never run, because Rossmann compute nodes only have 24 processor cores each.

Normally, compute nodes running your job may also be running jobs from other users. ITaP research systems have many processor cores in each compute node, so node sharing allows more efficient use of the system. However, if you have special needs that prohibit others from effectively sharing a compute node with your job, such as needing all of the memory on a compute node, you may request exclusive access to any compute nodes allocated to your job.

To request exclusive access to a compute node, set ppn to the maximum number of processor cores physically available on a compute node:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile

If more convenient, you may also specify any command line options to qsub from within your job submission file, using a special form of comment:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

#PBS -V
#PBS -q myqueuename
#PBS -l nodes=1:ppn=24
#PBS -l walltime=01:30:00
#PBS -N myjobname

# Print the hostname of the compute node on which this job is running.
/bin/hostname

If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.

After you submit your job with qsub, it can reside in a queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the number of compute nodes requested, the amount of wall time requested, and what other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.

PBS catches only output written to standard output and standard error. Standard output (output normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "o", for example myjobsubmissionfile.o1234, where "1234" represents the PBS job ID. Errors that occurred during the job run and written to standard error (output also normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "e", for example myjobsubmissionfile.e1234. Often, the error file is empty. If your job wrote results to a file, those results will appear in that file.

Parallel applications may require special care in the selection of PBS resources. Please refer to the sections that follow for details on how to run parallel applications with various parallel libraries.

Job Status

The command qstat -a will list all jobs currently queued or running and some information about each:

$ qstat -a

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
107025.rossmann user123  standby  hello         --    1   8    --  00:05 Q   --
115505.rossmann user456  ncn      job4         5601   1   1    --  600:0 R 575:0
...
189479.rossmann user456  standby  AR4b          --    5  40    --  04:00 H   --
189481.rossmann user789  standby  STDIN        1415   1   1    --  00:30 R 00:07
189483.rossmann user789  standby  STDIN        1758   1   1    --  00:30 R 00:07
189484.rossmann user456  standby  AR4b          --    5  40    --  04:00 H   --
189485.rossmann user456  standby  AR4b          --    5  40    --  04:00 Q   --
189486.rossmann user123  tg_workq STDIN         --    1   1    --  12:00 Q   --
189490.rossmann user456  standby  job7        26655   1   8    --  04:00 R 00:06
189491.rossmann user123  standby  job11         --    1   8    --  04:00 Q   --

The status of each job listed appears in the "S" column toward the right. Possible status codes are: "Q" = Queued, "R" = Running, "C" = Completion, and "H" = Held.

To see only your own jobs, use the -u option to qstat and specify your own username:

$ qstat -a -u myusername

rossmann-adm.rcac.purdue.edu:
                                                              Req'd  Req'd   Elap
Job ID          Username   Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- ---------- -------- ---------- ------ --- --- ------ ----- - -----
182792.rossmann myusername standby  job1        28422   1   4    --  23:00 R 20:19
185841.rossmann myusername standby  job2        24445   1   4    --  23:00 R 20:19
185844.rossmann myusername standby  job3        12999   1   4    --  23:00 R 20:18
185847.rossmann myusername standby  job4        13151   1   4    --  23:00 R 20:18

To retrieve useful information about your queued or running job, use the checkjob command with your job's ID number. The output should look similar to the following:

$ checkjob -v 163000

job 163000 (RM job '163000.rossmann-adm.rcac.purdue.edu')

AName: test
State: Idle 
Creds:  user:myusername  group:mygroup  class:myqueue
WallTime:   00:00:00 of 20:00:00
SubmitTime: Wed Apr 18 09:08:37
  (Time Queued  Total: 1:24:36  Eligible: 00:00:23)

NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 2
Total Requested Nodes: 1

Req[0]  TaskCount: 2  Partition: ALL  
TasksPerNode: 2  NodeCount:  1


Notification Events: JobFail

IWD:            /home/myusername/gaussian
UMask:          0000 
OutputFile:     rossmann-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.o163000
ErrorFile:      rossmann-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.e163000
User Specified Partition List:   rossmann-adm,SHARED
Partition List: rossmann-adm
SrcRM:          rossmann-adm  DstRM: rossmann-adm  DstRMJID: 163000.rossmann-adm.rcac.purdue.edu
Submit Args:    -l nodes=1:ppn=2,walltime=20:00:00 -q myqueue
Flags:          RESTARTABLE
Attr:           checkpoint
StartPriority:  1000
PE:             2.00
NOTE:  job violates constraints for partition rossmann-adm (job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160))

BLOCK MSG: job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160) (recorded at last scheduling iteration)

There are several useful bits of information in this output.

  • State lets you know if the job is Idle, Running, Completed, or Held.
  • WallTime will show how long the job has run and its maximum time.
  • SubmitTime is when the job was submitted to the cluster.
  • Total Requested Tasks is the total number of cores used for the job.
  • Total Requested Nodes and NodeCount are the number of nodes used for the job.
  • TasksPerNode is the number of cores used per node.
  • IWD is the job's working directory.
  • OutputFile and ErrorFile are the locations of stdout and stderr of the job, respectively.
  • Submit Args will show the arguments given to the qsub command.
  • NOTE/BLOCK MSG will show details on why the job isn't running. The above error says that all the cores are in use on that queue and the job has to wait. Other errors may give insight as to why the job fails to start or is held.

To view the output of a running job, use the qpeek command with your job's ID number. The -f option will continually output to the screen similar to tail -f, while qpeek without options will just output the whole file so far. Here is an example output from an application:

$ qpeek -f 1651025
TIMING: 600  CPU: 97.0045, 0.0926592/step  Wall: 97.0045, 0.0926592/step, 0.11325 hours remaining, 809.902344 MB of memory in use.
ENERGY:     600    359272.8746    280667.4810     81932.7038      5055.7519       -4509043.9946    383233.0971         0.0000         0.0000    947701.9550       -2451180.1312       298.0766  -3398882.0862  -2442581.9707       298.2890           1125.0475        77.0325  10193721.6822         3.5650         3.0569

TIMING: 800  CPU: 118.002, 0.104987/step  Wall: 118.002, 0.104987/step, 0.122485 hours remaining, 809.902344 MB of memory in use.
ENERGY:     800    360504.1138    280804.0922     82052.0878      5017.1543       -4511471.5475    383214.3057         0.0000         0.0000    946597.3980       -2453282.3958       297.7292  -3399879.7938  -2444652.9520       298.0805            978.4130        67.0123  10193578.8030        -0.1088         0.2596

TIMING: 1000  CPU: 144.765, 0.133817/step  Wall: 144.765, 0.133817/step, 0.148686 hours remaining, 809.902344 MB of memory in use.
ENERGY:    1000    361525.2450    280225.2207     81922.0613      5126.4104       -4513315.2802    383460.2355         0.0000         0.0000    947232.8722       -2453823.2352       297.9291  -3401056.1074  -2445219.8163       297.9184            823.8756        43.2552  10193174.7961        -0.7191        -0.2392
...

Job Hold

To place a hold on a job before it starts running, use the qhold command:

$ qhold myjobid

Once a job has started running it can not be placed on hold.

To release a hold on a job, use the qrls command:

$ qrls myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

Job Dependencies

Job dependencies may be configured to ensure jobs start in a specified order. Jobs can be configured to run after other job state changes, such as when the job starts or the job ends.

These examples illustrate setting dependencies in several ways. Typically dependencies are set by capturing and using the job ID from the last job submitted.

To run a job after job myjobid has started:

$ qsub -W depend=after:myjobid myjobsubmissionfile

To run a job after job myjobid ends without error:

$ qsub -W depend=afterok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with errors:

$ qsub -W depend=afternotok:myjobid myjobsubmissionfile

To run a job after job myjobid ends with or without errors:

$ qsub -W depend=afterany:myjobid myjobsubmissionfile

To set more complex dependencies on multiple jobs and conditions:

$ qsub -W depend=after:myjobid1:myjobid2:myjobid3,afterok:myjobid4 myjobsubmissionfile

Dependencies are an automated way of holding and releasing jobs. Jobs with a dependency are held until the condition is satisfied. Once the condition is satisified jobs only then become eligible to run and must still queue as normal.

Job Cancellation

To stop a job before it finishes or remove it from a queue, use the qdel command:

$ qdel myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

Examples

To submit jobs successfully, you must understand how to request the right computing resources. This section contains examples of specific types of PBS jobs. These examples illustrate requesting various groupings of nodes and processor cores, using various parallel libraries, and running interactive jobs. You may wish to look here for an example that is most similar to your application and use a modified version of that example's job submission file for your jobs.

Batch

This simple example submits the job submission file hello.sub to the standby queue on Rossmann and requests 4 nodes:

$ qsub -q standby -l nodes=4,walltime=00:01:00 hello.sub
99.rossmann-adm.rcac.purdue.edu

Remember that ppn can not be larger than the number of processor cores on each node.

After your job finishes running, the ls command will show two new files in your directory, the .o and .e files:

$ ls -l
hello
hello.c
hello.out
hello.sub
hello.sub.e99
hello.sub.o99

If everything went well, then the file hello.sub.e99 will be empty, since it contains any error messages your program gave while running. The file hello.sub.o99 contains the output from your program.

Using Environment Variables in a Job

If you would like to see the value of the environment variables from within a PBS job, you can prepare a job submission file with an appropriate filename, here named env.sub:

#!/bin/sh -l
# FILENAME:  env.sub

# Request four nodes, 1 processor core on each.
#PBS -l nodes=4:ppn=1,walltime=00:01:00
	
# Change to the directory from which you submitted your job.
cd $PBS_O_WORKDIR
	
# Show details, especially nodes.
# The results of most of the following commands appear in the error file.
echo $PBS_O_HOST
echo $PBS_O_QUEUE
echo $PBS_O_SYSTEM
echo $PBS_O_WORKDIR
echo $PBS_ENVIRONMENT
echo $PBS_JOBID
echo $PBS_JOBNAME

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

Submit this job:

$ qsub env.sub

Multiple Node

This section illustrates various requests for one or multiple compute nodes and ways of allocating the processor cores on these compute nodes. Each example submits a job submission file (myjobsubmissionfile.sub) to a batch session. The job submission file contains a single command cat $PBS_NODEFILE to show the names of the compute node(s) allocated. The list of compute node names indicates the geometry chosen for the job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE

All examples use the default queue of the cluster.

One processor core on any compute node

A job shares the other resources, in particular the memory, of the compute node with other jobs. This request is typical of a serial job:

$ qsub -l nodes=1 myjobsubmissionfile.sub

Compute node allocated:

rossmann-a639

Two processor cores on any compute nodes

This request is typical of a distributed-memory (MPI) job:

$ qsub -l nodes=2 myjobsubmissionfile.sub

Compute node(s) allocated:

rossmann-a639
rossmann-a638

All processor cores on one compute node

The option ppn can not be larger than the number of cores on each compute node on the machine in question. This request is typical of a shared-memory (OpenMP) job:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile.sub

Compute node allocated:

rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637

All processor cores on any two compute nodes

The option ppn can not be larger than the number of processor cores on each compute node on the machine in question. This request is typical of a hybrid (distributed-memory and shared-memory) job:

$ qsub -l nodes=2:ppn=24 myjobsubmissionfile.sub

Compute nodes allocated:

rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638

Multinode geometry from option nodes is one processor core per node (scattered placement)

$ qsub -l nodes=8 myjobsubmissionfile.sub

rossmann-a001
rossmann-a003
rossmann-a004
rossmann-a005
rossmann-a006
rossmann-a007
rossmann-a008
rossmann-a009

Multinode geometry from option procs is one or more processor cores per node (free placement)

$ qsub -l procs=8 myjobsubmissionfile.sub

The placement of processor cores can range from all on one compute node (packed) to all on unique compute nodes (scattered). A few examples follow:

rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001

rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a002
rossmann-a002
rossmann-a003
rossmann-a004
rossmann-a004

rossmann-a000
rossmann-a001
rossmann-a002
rossmann-a003
rossmann-a004
rossmann-a005
rossmann-a006
rossmann-a007

Four compute nodes, each with two processor cores

$ qsub -l nodes=4:ppn=2 myjobsubmissionfile.sub

rossmann-a001
rossmann-a001
rossmann-a003
rossmann-a003
rossmann-a004
rossmann-a004
rossmann-a005
rossmann-a005

Eight processor cores can come from any four compute nodes

$ qsub -l nodes=4 -l procs=8 myjobsubmissionfile.sub

rossmann-a001
rossmann-a001
rossmann-a003
rossmann-a003
rossmann-a004
rossmann-a004
rossmann-a005
rossmann-a005

Exclusive access to one compute node, using one processor core

Achieving this geometry requires modifying the job submission file, here named myjobsubmissionfile.sub:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE
uniq <$PBS_NODEFILE >nodefile
echo " "
cat nodefile

To gain exclusive access to a compute node, specify all processor cores that are physically available on a compute node:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile.sub

rossmann-a005
rossmann-a005
...
rossmann-a005

rossmann-a005

This request is typical of a serial job that needs access to all of the memory of a compute node.

Specific Types of Nodes

You may also request that a job be run on specific nodes based on various quantities such as sub-cluster type, node memory and/or job properties.

These examples submit a job submission file, here named myjobsubmissionfile.sub, to the default queue. The job submission file contains a single command (cat $PBS_NODEFILE) to show the allocated node(s).

Example: a job requires a compute node in an "A" sub-cluster:

$ qsub -l nodes=1:A myjobsubmissionfile.sub 

Compute node allocated:

rossmann-a009

Example: a job requires a compute node with 48 GB of physical memory:

$ qsub -l nodes=1:nodemem48gb myjobsubmissionfile.sub 

Compute node allocated:

rossmann-a009

Example: a job declares that it would require 48 GB of physical memory for itself (and thus needs a node that has more than that):

$ qsub -l nodes=1,pmem=48gb myjobsubmissionfile.sub 

Compute node allocated:

rossmann-b009

Note that the pmem=48gb job above does not run on a 48 GB node. Since the operating system requires some memory for itself (possibly about 2 GB, leaving just 46 GB free on a 48 GB node), a pmem=48gb job will not fit into such a node, and PBS will place the job on a larger-memory node. If the requested pmem= value is greater than the free RAM in the largest available node, the job will never start.

The first two examples above (the A and nodemem48gb keywords) refer to node properties, while the third example above (the pmem=48gb keyword) declares a job property. By using node properties, you can direct your job to the desired node type ("give me a 48 GB node" or "give me a node in sub-cluster A"). Using job properties allows you to state what your job requires and let the scheduler find any node which meets these requirements (i.e. "give me a node that is capable of fitting my 48 GB job"). The former will always go to 48 GB nodes, while the latter may end up on either of 96 or 192 GB nodes, whichever is available.

Refer to Detailed Hardware Specification section for list of available sub-clusters and their respective per-node memory sizes for the nodemem keyword.

Interactive Job

Interactive jobs can run on compute nodes. You can start interactive jobs either with specific time constraints (walltime=hh:mm:ss) or with the default time constraints of the queue to which you submit your job.

If you request an interactive job without a wall time option, PBS assigns to your job the default wall time limit for the queue to which you submit. If this is shorter than the time you actually need, your job will terminate before completion. If, on the other hand, this time is longer than what you actually need, you are effectively withholding computing resources from other users. For this reason, it is best to always pass a reasonable wall time value to PBS for interactive jobs.

Once your interactive job starts, you may use that connection as an interactive shell and invoke whatever other programs or other commands you wish. To submit an interactive job with one hour of wall time, use the -I option to qsub:

$ qsub -I -l walltime=01:00:00
qsub: waiting for job 100.rossmann-adm.rcac.purdue.edu to start
qsub: job 100.rossmann-adm.rcac.purdue.edu ready

If you need to use a remote X11 display from within your job (see the SSH X11 Forwarding Section), add the -v DISPLAY option to qsub as well:

$ qsub -I -l walltime=01:00:00 -v DISPLAY
qsub: waiting for job 101.rossmann-adm.rcac.purdue.edu to start
qsub: job 101.rossmann-adm.rcac.purdue.edu ready

To quit your interactive job:

logout

Serial

A serial job is a single process whose steps execute as a sequential stream of instructions on one processor core.

This section illustrates how to use PBS to submit to a batch session one of the serial programs compiled in the section Compiling Serial Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

Suppose that you named your executable file serial_hello. Prepare a job submission file with an appropriate filename, here named serial_hello.sub:

#!/bin/sh -l
# FILENAME:  serial_hello.sub

module load devel
cd $PBS_O_WORKDIR

./serial_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

Submit the serial job to the default queue on Rossmann and request 1 compute node with 1 processor core and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00 ./serial_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
serial_hello
serial_hello.c
serial_hello.sub
serial_hello.sub.emyjobid
serial_hello.sub.omyjobid

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:rossmann-a639.rcac.purdue.edu   hello, world

If the job failed to run, then view error messages in the file serial_hello.sub.emyjobid.

If a serial job uses a lot of memory and finds the memory of a compute node overcommitted while sharing the compute node with other jobs, specify the number of processor cores physically available on the compute node to gain exclusive use of the compute node:

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 serial_hello.sub

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:rossmann-a639.rcac.purdue.edu   hello, world

MPI

A message-passing job is a set of processes (often multiple copies of a single process) that take advantage of distributed-memory systems by communicating with each other via the sending and receiving of messages. Work occurs across several compute nodes of a distributed-memory system. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. OpenMPI, MPICH2, MVAPICH2, and Intel MPI (IMPI) are implementations of the MPI standard.

This section illustrates how to use PBS to submit to a batch session one of the MPI programs compiled in the section Compiling MPI Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

The path to relevant MPI libraries is not set up on any run host by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Rossmann, including MPI library packages. Then, to use one of the available MPI modules, enter the module load command.

Suppose that you named your executable file mpi_hello. Prepare a job submission file with an appropriate filename, here named mpi_hello.sub:

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR

mpiexec -n 48 ./mpi_hello

You can load any MPI library/compiler module that is available on Rossmann (This example uses the OpenMPI library).

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

You invoke an MPI program with the mpiexec command. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application and should typically be equal to the total number of processor cores you request from PBS (more on this below).

Submit the MPI job to the default queue on Rossmann and request 2 compute nodes with all 24 processor cores and 24 MPI ranks on each compute node and 1 minute of wall time. This will use two complete compute nodes of the Rossmann cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 ./mpi_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
mpi_hello
mpi_hello.c
mpi_hello.sub
mpi_hello.sub.emyjobid
mpi_hello.sub.omyjobid

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:rossmann-a010.rcac.purdue.edu   Rank:0 of 48 ranks   hello, world
Runhost:rossmann-a010.rcac.purdue.edu   Rank:1 of 48 ranks   hello, world
   ...
Runhost:rossmann-a010.rcac.purdue.edu   Rank:23 of 48 ranks   hello, world
Runhost:rossmann-a011.rcac.purdue.edu   Rank:24 of 48 ranks   hello, world
Runhost:rossmann-a011.rcac.purdue.edu   Rank:25 of 48 ranks   hello, world
   ...
Runhost:rossmann-a011.rcac.purdue.edu   Rank:47 of 48 ranks   hello, world

If the job failed to run, then view error messages in the file mpi_hello.sub.emyjobid.

If an MPI job uses a lot of memory and 24 MPI ranks per compute node use all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores on each compute node, while keeping the total number of MPI ranks unchanged.

Submit the job to the default queue with double the number of compute nodes and half the number of MPI ranks per compute node (the total number of MPI ranks remains unchanged). Use the -n exclusive flag to ensure no other jobs will share your nodes and use the extra memory required by your job.

$ qsub -l nodes=4:ppn=12,walltime=00:01:00 -n ./mpi_hello.sub

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:rossmann-c010.rcac.purdue.edu   Rank:0 of 48 ranks   hello, world
Runhost:rossmann-c010.rcac.purdue.edu   Rank:1 of 48 ranks   hello, world
   ...
Runhost:rossmann-c010.rcac.purdue.edu   Rank:11 of 48 ranks   hello, world
Runhost:rossmann-c011.rcac.purdue.edu   Rank:12 of 48 ranks   hello, world
Runhost:rossmann-c011.rcac.purdue.edu   Rank:13 of 48 ranks   hello, world
   ...
Runhost:rossmann-c011.rcac.purdue.edu   Rank:23 of 48 ranks   hello, world
Runhost:rossmann-c012.rcac.purdue.edu   Rank:24 of 48 ranks   hello, world
Runhost:rossmann-c012.rcac.purdue.edu   Rank:25 of 48 ranks   hello, world
   ...
Runhost:rossmann-c012.rcac.purdue.edu   Rank:35 of 48 ranks   hello, world
Runhost:rossmann-c013.rcac.purdue.edu   Rank:36 of 48 ranks   hello, world
Runhost:rossmann-c013.rcac.purdue.edu   Rank:37 of 48 ranks   hello, world
   ...
Runhost:rossmann-c013.rcac.purdue.edu   Rank:47 of 48 ranks   hello, world

Notes

  • In general, the exact order in which MPI ranks output similar write requests to an output file is random.
  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Rossmann is "standby".
  • Invoking an MPI program on Rossmann with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke an MPI program.

For an introductory tutorial on how to write your own MPI programs:

OpenMP

A shared-memory job is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over several processor cores of a multi-core processor. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables.

This section illustrates how to use PBS to submit to a batch session one of the OpenMP programs, either task parallelism or loop-level (data) parallelism, compiled in the section Compiling OpenMP Programs. There is no difference in running a Fortran, C, or C++ OpenMP program after compiling and linking it into an executable file.

When running OpenMP programs, all threads must be on the same compute node to take advantage of shared memory. The threads cannot communicate between nodes.

To run an OpenMP program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS mynumberofthreads

In bash:

$ export OMP_NUM_THREADS=mynumberofthreads

Suppose that you named your executable file omp_hello. Prepare a job submission file with an appropriate name, here named omp_hello.sub:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=24

./omp_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the program.

Submit the OpenMP job to the default queue on Rossmann and request 1 complete compute node with all 24 processor cores (OpenMP threads) on the compute node and 1 minute of wall time. This will use one complete compute node of the Rossmann cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 omp_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
omp_hello
omp_hello.c
omp_hello.sub
omp_hello.sub.emyjobid
omp_hello.sub.omyjobid

View the results from one of the sample OpenMP programs about task parallelism:

$ cat omp_hello.sub.omyjobid
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   hello, world
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

If the job failed to run, then view error messages in the file omp_hello.sub.emyjobid.

If an OpenMP program uses a lot of memory and 24 threads use all of the memory of the compute node, use fewer processor cores (OpenMP threads) on that compute node.

Modify the job submission file omp_hello.sub to use half the number of processor cores:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=12

./omp_hello

Submit the job to the default queue. Be sure to request the whole node or other jobs may use the extra memory your job requires.

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 omp_hello.sub

View the results from one of the sample OpenMP programs about task parallelism and using half the number of processor cores:

$ cat omp_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

Practice submitting the sample OpenMP program about loop-level (data) parallelism:

#!/bin/sh -l
# FILENAME:  omp_loop.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=24

./omp_loop

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 omp_loop.sub

SERIAL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   Iteration:0  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   Iteration:1  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   Iteration:2  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   Iteration:3  hello, world
   ...
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   Iteration:46  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   Iteration:47  hello, world
SERIAL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

Hybrid

A hybrid job combines both message-passing and shared-memory attributes to take advantage of distributed-memory systems with multi-core processors. Work occurs across several compute nodes of a distributed-memory system and across the processor cores of the multi-core processors.

This section illustrates how to use PBS to submit to a batch session one of the hybrid programs compiled in the section Compiling Hybrid Programs. There is no difference in running a Fortran, C, or C++ hybrid program after compiling and linking it into an executable file.

The path to relevant MPI libraries is not set up on any compute node by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Rossmann, including MPI library packages. Then, to use one of the available MPI modules, enter the module load command.

When running hybrid programs, use all processor cores of the compute nodes to take advantage of shared memory.

To run a hybrid program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS mynumberofthreads

In bash:

$ export OMP_NUM_THREADS=mynumberofthreads

Suppose that you named your executable file hybrid_hello. Prepare a job submission file with an appropriate filename, here named hybrid_hello.sub:

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=24 

mpiexec -n 2 -machinefile nodefile ./hybrid_hello

You can load any MPI library/compiler module that is available on Rossmann. This example uses the OpenMPI library.

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

You invoke a hybrid program with the mpiexec command. You may need to specify how to place the threads on the compute node. Several examples on how to specify thread placement with various MPI libraries are at the bottom of this section. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application (more on this below).

Submit the hybrid job to the default queue on Rossmann and request 2 whole compute nodes with 1 MPI rank and all 24 processor cores (OpenMP threads) on each compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 hybrid_hello.sub
179168.rossmann-adm.rcac.purdue.edu

View two new files in your directory (.o and .e):

$ ls -l
hybrid_hello
hybrid_hello.c
hybrid_hello.sub
hybrid_hello.sub.emyjobid
hybrid_hello.sub.omyjobid

View the results from one of the sample hybrid programs about task parallelism:

$ cat hybrid_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   hello, world SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   hello, world SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

If the job failed to run, then view error messages in the file hybrid_hello.sub.emyjobid.

If a hybrid job uses a lot of memory and 24 OpenMP threads per compute node uses all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores (OpenMP threads) on each compute node.

Prepare a job submission file with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=12

mpiexec -n 4 -machinefile nodefile ./hybrid_hello

Submit the job to the default queue on Rossmann with double the number of compute nodes (MPI ranks). Be sure to request the whole node or other jobs may use the extra memory your job requires.

$ qsub -l nodes=4:ppn=24,walltime=00:01:00 hybrid_hello.sub

View the results from one of the sample hybrid programs about task parallelism with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

$ cat hybrid_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 12 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:1 of 12 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:11 of 12 threads   hello, world SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 12 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:1 of 12 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:11 of 12 threads   hello, world SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 12 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:1 of 12 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:11 of 12 threads   hello, world SERIAL REGION:     Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 12 threads   hello, world PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:1 of 12 threads   hello, world    ...
PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:11 of 12 threads   hello, world SERIAL REGION:     Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 1 thread    hello, world

Practice submitting the sample OpenMP program about loop-level (data) parallelism:

#!/bin/sh -l
# FILENAME:  hybrid_loop.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=24 

mpiexec -n 2 -machinefile nodefile ./hybrid_loop

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 hybrid_loop.sub


SERIAL REGION:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   Iteration:0   hello, world PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   Iteration:1   hello, world PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   Iteration:2   hello, world PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   Iteration:3   hello, world    ...
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   Iteration:46   hello, world PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   Iteration:47   hello, world
SERIAL REGION:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world 
SERIAL REGION:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   Iteration:0   hello, world PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   Iteration:1   hello, world PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   Iteration:2   hello, world PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   Iteration:3   hello, world    ...
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   Iteration:46   hello, world PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   Iteration:47   hello, world
SERIAL REGION:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world 

Thread placement

Compute nodes are made up of two or more processor chips, or sockets. Typically each socket shares a memory controller and communication busses for all of its cores. Consider these cores as having "shortcuts" to each other. Cores within a socket will be able to communicate faster and more efficiently amongst themselves than with another socket or compute node. MPI ranks should consequently be placed so that they can utilize these "shortcuts". When running hybrid codes it is essential to specify this placement as by default some MPI libraries will limit a rank to a single core or may scatter a rank across processor chips.

Below are examples on how to specify this placement with several MPI libraries. Hybrid codes should be run within jobs requesting the entire job by either using ppn=24 or the -n exclusive flag or the job may result in unexpected and poor thread placement.

OpenMPI 1.6.3

mpiexec -cpus-per-rank $OMP_NUM_THREADS --bycore -np 2 -machinefile nodefile ./hybrid_loop

OpenMPI 1.8

mpiexec -map-by socket:pe=$OMP_NUM_THREADS -np 2 -machinefile nodefile ./hybrid_loop

Intel MPI

mpiexec -np 2 -machinefile nodefile ./hybrid_loop

MVAPICH2

mpiexec -env MV2_ENABLE_AFFINITY 0 -np 2 -machinefile nodefile ./hybrid_loop

Notes

  • In general, the exact order in which MPI processes of a hybrid program output similar write requests to an output file is random.
  • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Rossmann is "standby".
  • Invoking a hybrid program on Rossmann with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke a hybrid program.

Scratch File

Some applications process data stored in a large input data file. The size of this file may be so large that it cannot fit within the quota of a home directory. This file might reside on Fortress or some other external storage medium. The way to process this file on Rossmann is to copy it to your scratch directory where a job running on a compute node of Rossmann may access it.

This section illustrates how to submit a small job which reads a data file which resides on the scratch file system. This example, myprogram.c, displays the name of the compute node which runs the job, the path name of the current working directory, the contents of that directory, and copies the contents of an input scratch file to an output scratch file. Linux commands access system information. To compile this program, see Compiling Serial Programs.

Prepare a scratch file directory with a large input data file:

$ ls -l $RCAC_SCRATCH
total 96
-rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile

Prepare a job submission file with the path to your scratch file directory listed as a command-line argument and with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load devel
cd $PBS_O_WORKDIR

./myprogram $RCAC_SCRATCH

Submit this job to the default queue on Rossmann and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it.

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View two new files in the home directory (.o and .e):

$ ls -l
total 160
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rw------- 1 myusername itap    0 Jun  8 11:05 myjob.sub.e266283
-rw------- 1 myusername itap  780 Jun  8 11:05 myjob.sub.o266283
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram*
-rw-r--r-- 1 myusername itap 3930 Jun  8 11:13 myprogram.c

View one new file in the scratch file directory, bigoutputdatafile:

$ ls -l $RCAC_SCRATCH
total 96
-rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile
-rw-r--r-- 1 myusername itap   42 Jun  8 11:05 mybigoutputdatafile

View results in the output file:

$ cat myjob.sub.o266283
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
rossmann-d036.rcac.purdue.edu
/home/myusername
total 128
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
-rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
total 128
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
-rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
***  MAIN START  ***

input scratch file:   /scratch/lustreA/m/myusername/mybiginputdatafile
output scratch file:  /scratch/lustreA/m/myusername/mybigoutputdatafile
scratch file system:  textfromscratchfile

***  MAIN  STOP  ***

The output shows the name of the compute node which PBS chose to run the job, the path of the current working directory (the user's home directory), before-and-after listings of the content of the current working directory, and output from the application. The output scratch file named mybigoutdatafile, the primary output of this program, appears in the scratch directory, not the home directory.

/tmp File

Some applications write a large amount of intermediate data to a temporary file during an early part of the process then read that data for further processing during a later part of the process. The size of this file may be so large that it cannot fit within the quota of a home directory or that it requires too much I/O activity between the compute node and either the home directory or the scratch file directory. The way to process this intermediate file on Rossmann is to use the /tmp directory of the compute node which runs the job. Used properly, /tmp may provide faster local storage to an active process than any other storage option.

This section illustrates how to submit a small job which first writes then reads an intermediate data file which resides on the /tmp directory. This example, myprogram.c, displays the contents of the /tmp directory before and after processing. Linux commands access system information. To compile this program, see Compiling Serial Programs.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load devel
cd $PBS_O_WORKDIR

./myprogram

Submit this job to the default queue on Rossmann and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View results in the output file, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
-rw-r--r-- 1 myusername itap 12 Jun 16 11:36 /tmp/mytmpfile
***  MAIN START  ***

/tmp file data:  abcdefghijk

***  MAIN  STOP  ***

The output verifies the existence of the intermediate data file in the /tmp directory.

View results in the error file, myjob.sub.emyjobid:

ls: /tmp/mytmpfile: No such file or directory

The results in the error file verify that the intermediate data file does not exist at the start of processing.

While the /tmp directory can provide faster local storage to an active process than other storage options, you never know how much storage is available in the /tmp directory of the compute node chosen to run your job. If an intermediate data file consistently fails to fit in the /tmp directories of a set of compute nodes, consider limiting the pool of candidate compute nodes to those which can handle your intermediate data file.

Commercial and Third-Party Applications

Several commercial and third-party software packages are available on Rossmann and accessible through PBS.

We try to continually test the examples in the next few sectionss, but you may find some differences. If you need assistance, please contact us.

With the exception of Octave and R, which are free software, only Purdue affiliates may use the following licensed software.

Gaussian

Gaussian is a computational chemistry software package which works on electronic structure. This section illustrates how to submit a small Gaussian job to a PBS queue. This Gaussian example runs the Fletcher-Powell multivariable optimization.

Prepare a Gaussian input file with an appropriate filename, here named myjob.com. The final blank line is necessary:

#P TEST OPT=FP STO-3G OPTCYC=2

STO-3G FLETCHER-POWELL OPTIMIZATION OF WATER

0 1
O
H 1 R
H 1 R 2 A

R 0.96
A 104.

To submit this job, load Gaussian then run the provided script, named subg09. This job uses one compute node with 8 processor cores:

$ module load gaussian09
$ subg09 myjob -l nodes=1:ppn=8

View job status:

$ qstat -u myusername

View results in the file for Gaussian output, here named myjob.log. Only the first and last few lines appear here:

 Entering Gaussian System, Link 0=/apps/rhel5/g09-B.01/g09/g09
 Initial command:
 /apps/rhel5/g09-B.01/g09/l1.exe /scratch/scratch96/m/myusername/gaussian/Gau-7781.inp -scrdir=/scratch/scratch96/m/myusername/gaussian/
 Entering Link 1 = /apps/rhel5/g09-B.01/g09/l1.exe PID=      7782.
  
 Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2010,
            Gaussian, Inc.  All Rights Reserved.

.
.
.

 Job cpu time:  0 days  0 hours  1 minutes 37.3 seconds.
 File lengths (MBytes):  RWF=      5 Int=      0 D2E=      0 Chk=      1 Scr=      1
 Normal termination of Gaussian 09 at Wed Mar 30 10:49:02 2011.
real 17.11
user 92.40
sys 4.97
Machine:
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389

Examples of Gaussian PBS Job Submissions

Submit job using 4 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=4,walltime=200:00:00 -q myqueuename

Submit job using 4 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=4,walltime=200:00:00 -q myqueuename

Submit job using 8 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=8,walltime=200:00:00 -q myqueuename

Submit job using 8 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=8,walltime=200:00:00 -q myqueuename

For more information about Gaussian:

Maple

Maple is a general-purpose computer algebra system. This section illustrates how to submit a small Maple job to a PBS queue. This Maple example differentiates, integrates, and finds the roots of polynomials.

Prepare a Maple input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple
cd $PBS_O_WORKDIR

# Use the -q option to suppress startup messages.
# maple -q myjob.in
maple myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple

# Use the -q option to suppress startup messages.
# maple -q << EOF
maple << EOF

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );
EOF

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
                                         2
                                      6 x

                                                           2
                      2                                   x
                  -3 x  cos(x) + 6 cos(x) + 6 x sin(x) + ----
                                                          2

                                    1/3, -1

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Maple:

Mathematica

Mathematica implements numeric and symbolic mathematics. This section illustrates how to submit a small Mathematica job to a PBS queue. This Mathematica example finds the three roots of a third-degree polynomial.

Prepare a Mathematica input file with an appropriate filename, here named myjob.in:

(* FILENAME:  myjob.in *)

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
cd $PBS_O_WORKDIR

math < myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
math << EOF

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit
EOF

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Mathematica 5.2 for Linux x86 (64 bit)
Copyright 1988-2005 Wolfram Research, Inc.
 -- Terminal graphics initialized --

In[1]:=
In[2]:=
In[2]:=
In[3]:=
                     2    3
Out[3]= 1 + 3 x + 3 x  + x

In[4]:=
Out[4]= {{x -> -1}, {x -> -1}, {x -> -1}}

In[5]:=

View the standard error file, myjob.sub.emyjobid:

rmdir: ./ligo/rengel/tasks: Directory not empty
rmdir: ./ligo/rengel: Directory not empty
rmdir: ./ligo: Directory not empty

For more information about Mathematica:

MATLAB

MATLAB® (MATrix LABoratory) is a high-level language and interactive environment for numerical computation, visualization, and programming. You can use MATLAB for a range of applications, including signal processing and communications, image and video processing, control systems, test and measurement, computational finance, and computational biology. MATLAB is a product of the MathWorks.

MATLAB, Simulink, Compiler, and several of the optional toolboxes are available to faculty, staff, and students. To see the kind and quantity of all MATLAB licenses plus the number that you are currently using you can use the matlab_licenses command:

$ module load matlab
$ matlab_licenses
                                                  Licenses
MATLAB Product / Toolbox Name            myusername    Free    Total
==================================      ===========================
Aerospace Blockset                              0       10       10
Aerospace Toolbox                               0       20       20
Bioinformatics Toolbox                          0       18       20
Communication Toolbox                           0       17       30
Compiler                                        0       15       15
Control Toolbox                                 0       67       75
Curve Fitting Toolbox                           0       51       95
Data Acq Toolbox                                0       10       10
Database Toolbox                                0        5        5
Datafeed Toolbox                                0        5        5
Dial and Gauge Blocks                           0       25       25
Econometrics Toolbox                            0       11       15
Excel Link                                      0        5        5
Financial Toolbox                               0       14       15
Fixed-Point Blocks                              0        5        5
Fixed Point Toolbox                             0       20       20
Fuzzy Toolbox                                   0       10       10
GADS Toolbox                                    0       11       15
Identification Toolbox                          0       15       15
Image Acquisition Toolbox                       0        5        5
Image Toolbox                                   0       81      120
Instr Control Toolbox                           0       12       25
MAP Toolbox                                     0       21       30
MATLAB                                          0      450    1,000
MATLAB Builder for dot Net                      0        1        1
MATLAB Builder for Java                         0        0        1
MATLAB Coder                                    0       27       35
MATLAB Distrib Comp Server                      0      256      256
MATLAB Excel Builder                            0        0        1
MATLAB Report Gen                               0        2        2
MBC Toolbox                                     0        5        5
MPC Toolbox                                     0        5        5
Neural Network Toolbox                          0       14       15
OPC Toolbox                                     0        1        1
Optimization Toolbox                            0       76      125
Parallel Computing Toolbox                      0       38       50
PDE Toolbox                                     0       15       15
Power System Blocks                             0       26       30
Real-Time Win Target                            0       10       17
Real-Time Workshop                              0       12       35
RF Toolbox                                      0        0        1
Robust Toolbox                                  0        4        5
RTW Embedded Coder                              0       15       15
Signal Blocks                                   0       27       30
Signal Toolbox                                  0       65      100
SimBiology                                      0        5        5
SimHydraulics                                   0       15       15
SimMechanics                                    0        5        5
Simscape                                        0       22       30
SIMULINK                                        0       78      100
Simulink Control Design                         0       15       15
Simulink Design Optim                           0        5        5
SIMULINK Report Gen                             0        2        2
SL Verification Validation                      0        4        5
Stateflow                                       0       14       15
Statistics Toolbox                              0       19      120
Symbolic Toolbox                                0       59       75
Virtual Reality Toolbox                         0        5        5
Wavelet Toolbox                                 0       14       15
XPC Target                                      0       19       20

The table shows a list of MATLAB toolboxes available at Purdue, the number of licenses that you are currently using, a snapshot of the number of licenses currently free, and the total number of licenses which Purdue owns for each product. To limit the output to only the toolboxes your jobs are currently using, you can use the -u flag.

The MATLAB client can be run in the front-end for application development, however, computationally intensive jobs must be run on compute nodes. This means using the MATLAB function batch(), or running your MATLAB client on a compute node through the PBS scheduler.

MATLAB distinguishes three types of parallel jobs: distributed, matlabpool, and parallel. A distributed job is one or more independent, single-processor-core tasks of MATLAB statements, also known as a embarrassingly parallel job.

A pool job follows a master/worker model, in which one worker distributes and oversees the work accomplished by the rest of the worker pool. A pool job can also implement codistributed arrays as a means of handling data arrays which are too large to fit into the memory of any one compute node.

A parallel job is a single task running concurrently on two or more processor cores. The copies of the task are not independent; they may interact with each other. A parallel job is also known as a data-parallel job.

MATLAB also offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager. In addition, MATLAB offers implicit parallelism by default in the form of thread-parallel enabled functions.

The following sections provide several examples illustrating how to submit MATLAB jobs to a Linux compute cluster.

For more information about MATLAB:

MATLAB (Cluster Profile Manager)

MATLAB offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. The 'local' profile runs a MATLAB job on the processor core(s) of the same compute node that is running the client. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager.

To prepare a user-defined cluster profile, use the Cluster Profile Manager in the Parallel menu. This profile contains the PBS details (queue, nodes, ppn, walltime, etc.) of your job submission. Ultimately, your cluster profile will be an argument to MATLAB functions like batch(). To learn more about MATLAB's Parallel Computing Toolbox and setting up Cluster Profiles, please review the "Getting Started" section in the MATLAB documentation. The documentation for the release installed on Rossmann can be accessed directly from a MATLAB session:

$ module load matlab
$ matlab -nodisplay -singleCompThread
>> doc distcomp

For your convenience, RCAC provides a generic cluster profile that can be downloaded here. To import the profile, start a MATLAB session and select Manage Cluster Profiles... from the Parallel menu. In the Cluster Profile Manager, select Import, navigate to the folder containing the profile, select mypbsprofile.settings and click OK. Please remember that the profile will need to be customized for your specific needs. If you have any questions, please contact us.

For detailed information about MATLAB's Parallel Computing Toolbox, examples, demos, and tutorials:

MATLAB (Interpreting an M-file)

The MATLAB interpreter is the part of MATLAB which reads M-files and MEX-files and executes MATLAB statements.

This section illustrates how to submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and gets the three random numbers. The system function hostname returns two values: a code and the run host name.

Prepare a MATLAB script M-file myscript.m, and a MATLAB function M-file myfunction.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name);

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Also, prepare a job submission file, here named myjob.sub. Run with the name of the script M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r myscript

Submit the job as a single compute node with one processor core:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
97986.rossmann-ad myusername      standby  myjob.sub    4645   1   1    --  00:01 R 00:00

Output shows one compute node (NDS) with one processor core (TSK).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


hostname:rossmann-a001.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows that a processor core on one compute node (a001) processed the entire job. One processor core processed myjob.sub and myscript.m. Output also displays the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about MATLAB:

MATLAB Compiler (Compiling an M-file)

The MATLAB Compiler translates an M-file into a standalone application or software component. A compiled version of an M-file can substantially improve performance of MATLAB code, especially for statements like for and while. The MATLAB Compiler Runtime (MCR) is a standalone set of shared libraries. Together, compiling and the MCR enable the execution of MATLAB files, even outside the MATLAB environment. While you do need to purchase a MATLAB Compiler license to build an executable, you may freely distribute the executable and the MCR without license restrictions.

This section illustrates how to compile and submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and computes the inverse of a matrix. The system function hostname returns two values: a code and the run host name.

This example uses the MATLAB Compiler mcc to compile a MATLAB M-file. During compilation, the default cluster profile may be either the 'local' profile or your PBS cluster profile; the results will be the same. This job is completely off the front end.

The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. For this reason, and to minimize your license usage, it is best to run the Compiler on one cluster.

Prepare either a MATLAB script M-file or a MATLAB function M-file. The method described below works for both.

The MATLAB script M-file includes the MATLAB statement quit to ensure that the compiled program terminates. Use an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name)

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;

The MATLAB function M-file has the usual function and end statements. Use an appropriate filename, here named myfunction.m:

% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_myscript.sh /apps/rhel5/MATLAB/R2012a

On a front end, load modules for MATLAB and GCC and verify the versions loaded. The MATLAB Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Compile the MATLAB script M-file:

$ module load matlab
$ module load gcc
$ mcc -m mywrapper.m myscript.m

A few new files appear after the compilation:

mccExcludedFiles.log
myscript
myscript.prj
myscript_main.c
myscript_mcc_component_data.c
readme.txt
run_myscript.sh

The name of the stand-alone executable file is myscript. The name of the shell script to run this executable file is run_myscript.sh.

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myscript $*
fi
exit

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
378428.rossmann-ad kes      workq    myjob.sub   18964   1   1    --  00:01 R 00:00

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a001.rcac.purdue.edu
run_myscript.sh
rossmann-a001.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2012a/runtime/glnxa64:/apps/rhel5/MATLAB_R2012a/bin/glnxa64:/apps/rhel5/MATLAB_R2012a/sys/os/glnxa
64:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64
/server:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.


hostname:rossmann-a001.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows the name of the compute node that ran the job submission file myjob.sub, the name of the compute node that ran the compiler-generated script run_myscript.sh, and the name of the compute node that ran the serial job: a001 in all three cases. Output also shows the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply this method of job submission to a MATLAB function M-file, prepare a wrapper function which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

# FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2012a

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

For more information about the MATLAB Compiler:

MATLAB Executable (MEX-file)

MEX stands for MATLAB Executable. A MEX-file offers an interface which allows MATLAB code to call functions written in C, C++, or Fortran as though these external functions were built-in MATLAB functions. MATLAB also offers external interface functions that facilitate the transfer of data between MEX-files and MATLAB. A MEX-file usually starts by transferring data from MATLAB to the MEX-file; then it processes the data with the user-written code; and finally, it transfers the results back to MATLAB. This feature involves compiling then dynamically linking the MEX-file to the MATLAB program. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while. Areas of application include legacy code written in C, C++, or Fortran.

This section illustrates how to use the PBS qsub command to submit a small MATLAB job with a MEX-file to a PBS queue.

This MEX example calls a C function which employs serial code to add two matrices. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license.

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Matrix (component-wise) addition. */
    for (i = 0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. Use an appropriate filename, here named matrixSum.c:

/***********************************************************
 * FILENAME:  matrixSum.c
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum (inMatrix, inMatrix, outMatrix, size)
 *
 * This is a MEX-file for MATLAB.
 *
 **********************************************************/

#include "mex.h"

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, mwSize n) {
    mwSize i;

    /* Component-wise addition. */
    for (i = 0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
void mexFunction (int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[]) {
    double *inMatrix_a;               /* mxn input matrix  */
    double *inMatrix_b;               /* mxn input matrix  */
    mwSize nrows_a,ncols_a;           /* size of matrix a  */
    mwSize nrows_b,ncols_b;           /* size of matrix b  */
    double *outMatrix_c;              /* mxn output matrix */

    /* Check for proper number of arguments. */
    if(nrhs!=2) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
    }
    if(nlhs!=1) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
    }

    /* Get dimensions of the first input matrix. */
    nrows_a = mxGetM(prhs[0]);
    ncols_a = mxGetN(prhs[0]);
    /* Get dimensions of the second input matrix. */
    nrows_b = mxGetM(prhs[1]);
    ncols_b = mxGetN(prhs[1]);

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
    }

    /* Make a pointer to the real data in the first input matrix. */
    inMatrix_a = mxGetPr(prhs[0]);
    /* Make a pointer to the real data in the second input matrix. */
    inMatrix_b = mxGetPr(prhs[1]);

    /* Make the output matrix. */
    plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);

    /* Make a pointer to the real data in the output matrix. */
    outMatrix_c = mxGetPr(plhs[0]);

    /* Call the computational routine. */
    matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
}

Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r myscript

To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). Compile matrixSum.c into a MATLAB-callable MEX-file:

$ module load matlab
$ module load gcc
$ mex matrixSum.c

The name of the MATLAB-callable MEX-file is matrixSum.mexa64. If you see the following warning, ignore it:

Warning: You are using gcc version "4.6.2".  The version
         currently supported with MEX is "4.3.4".
         For a list of currently supported compilers see:
         http://www.mathworks.com/support/compilers/current_release/

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a148.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

myscript.m:  hostname:rossmann-a148.rcac.purdue.edu

A =

     1     1     1
     1     1     1


B =

     2     2     2
     2     2     2


C =

     3     3     3
     3     3     3

Output shows the name of the compute node (a148) which processed this serial job. Also, this job shared the compute node with other jobs.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB MEX-file:

MATLAB Standalone Program

A stand-alone MATLAB program is a C, C++, or Fortran program which calls user-written M-files and the same libraries which MATLAB uses. A stand-alone program has access to MATLAB objects, such as the array and matrix classes, as well as all the MATLAB algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select MATLAB functions, a stand-alone MATLAB program may be a good option. This offers the possibility for substantially improved performance over MATLAB source code, especially for statements like for and while while still allowing use of specialized MATLAB functions where useful.

This section illustrates how to submit a small, stand-alone, MATLAB program to a PBS queue. This C example calls a compiled MATLAB script which computes the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

Prepare a MATLAB function which returns the inverse of a matrix. Use an appropriate filename, here named myinverse.m:

% FILENAME:  myinverse.m

function Y = myinverse (X)

    % Display name of compute node which runs this function.
    [c name] = system('hostname');
    fprintf('\n\nhostname:%s\n', name)

    % Invert a matrix.
    Y = inv(X);

end

Prepare a second MATLAB function which displays a matrix. Use an appropriate filename, here named myprintmatrix.m:

% FILENAME:  myprintmatrix.m

function myprintmatrix(A)
         disp(A)
end

Prepare a C source file with a main function and the necessary external function interface and give it an appropriate filename, here named myprogram.c. Note that when you invoke a MATLAB function from C, the MATLAB function name appears "mangled". The C program invokes the MATLAB function myinverse using the name mlfMyinverse and the MATLAB function myprintmatrix using the name mlfMyprintmatrix. You must modify all MATLAB function names in this manner when you call them from outside MATLAB:

/* FILENAME:  myprogram.c

Inverse of:

      A                B
   -------        ------------
   1  2  1         1 -3/2  1/2
   1  1  1   -->   1  -1   0
   3 -1  1        -2  7/2 -1/2



    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

*/


#include <stdio.h>
#include <math.h>
#include "libmylib.h"     /* compiler-generated header file */

int main (const int argc, char ** argv) {

    mxArray *A;   /* matrix containing                      */
    mxArray *B;   /* matrix containing result               */

    int Nrow=3, Ncol=3;
    double a[] = {1,2,1,1,1,1,3,-1,1};  /* row-major order  */
    double b[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
    double *ptr;

    printf("Enter myprogram.c\n");

    libmylibInitialize();     /* call mylib initialization  */

    /* Make an uninitialized Nrow x Ncol MATLAB matrix.    */
    A = mxCreateDoubleMatrix(Nrow, Ncol, mxREAL);

    /* Initialize the MATLAB matrix.                        */
    ptr = (double *)mxGetPr(A);
    memcpy(ptr,b,Nrow*Ncol*sizeof(double));

    /* Call mlfMyinverse, the compiled version of myinverse.m. */
    mlfMyinverse(1,&B,A);

    /* Print the results. */
    mlfMyprintmatrix(B);

    /* Free the matrices allocated during this computation. */
    mxDestroyArray(A);
    mxDestroyArray(B);

    libmylibTerminate();     /* call mylib initialization   */

    printf("Exit myprogram.c\n");
    return 0;
}

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

./myprogram

To access the MATLAB Compiler mcc and mbuild, load a MATLAB module. The MATLAB Compiler, mcc, depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but GCC Version 4.6.2 is compatible. Compile the user-written, MATLAB functions into a dynamically loaded, shared library. Compile the C program:

$ module load matlab
$ module load gcc
$ mcc -W lib:libmylib -T link:lib myinverse.m myprintmatrix.m
$ mbuild myprogram.c -L. -lmylib -I.

Several new files appear after the compilation:

libmylib.c
libmylib.exports
libmylib.h
libmylib.so
mccExcludedFiles.log
myinverse
myprintmatrix
myprogram
readme.txt

The name of the compiled, stand-alone MATLAB program is myprogram. The name of the dynamically linked library of user-written MATLAB functions is mylib.

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a145.rcac.purdue.edu
Enter myprogram.c
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: Unable to load Java Runtime Environment: libjvm.so: cannot open shared object file: No such file or directory
Warning: Disabling Java support
Hello, Thomas


hostname:rossmann-a145.rcac.purdue.edu

    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

Exit myprogram.c

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB stand-alone program:

MATLAB Engine Program

The MATLAB Engine allows using MATLAB as a computation engine. A MATLAB Engine program is a standalone C, C++, or Fortran program which calls functions of the Engine Library allowing you to start and end a MATLAB process, send data to and from MATLAB, and send commands to be processed in MATLAB.

This section illustrates how to submit a small, stand-alone, MATLAB Engine program to a PBS queue. This C program calls functions of the Engine Library to compute the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

Prepare a C program which computes the inverse of a matrix. Use an appropriate filename, here named myprogram.c:

/* FILENAME:  myprogram.c

A simple program to illustrate how to call MATLAB Engine functions
from a C program.  

Inverse of:

      A                B
   -------        ------------
   1  2  1         1 -3/2  1/2
   1  1  1   -->   1  -1   0
   3 -1  1        -2  7/2 -1/2

*/


#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "engine.h"
#define  BUFSIZE 256


int main ()
{
    Engine *ep;
    mxArray *A = NULL;
    mxArray *B = NULL;
    int Ncol=3, Nrow=3, col, row, ndx;
    double a[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
    double b[9] = {9,9,9,9,9,9,9,9,9};
    char buffer[BUFSIZE+1];

    printf("Enter myprogram.c\n");

    /* Call engOpen with a NULL string. This starts a MATLAB process */
    /* on the current host using the command "matlab".               */
    if (!(ep = engOpen(""))) {
        fprintf(stderr, "\nCan't start MATLAB engine\n");
        return EXIT_FAILURE;
    }

    buffer[BUFSIZE] = '\0';
    engOutputBuffer(ep, buffer, BUFSIZE);

    /* Make a variable for the data. */
    A = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
    B = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
    memcpy((void *)mxGetPr(A), (void *)a, sizeof(a));

    /* Place the variable A into the MATLAB workspace. */
    /* Place the variable B into the MATLAB workspace. */
    engPutVariable(ep, "A", A);
    engPutVariable(ep, "B", B);

    /* Evaluate and display the inverse. */
    engEvalString(ep, "B = inv(A)");
    printf("%s", buffer);

    /* Get variable B from the MATLAB workspace.       */
    /* Copy inverted matrix to a C array named "b".    */
    B = engGetVariable(ep, "B");
    memcpy((void *)b, (void *)mxGetPr(B), sizeof(b));
    ndx = 0;
    for (col=0;col<Ncol;++col) {
        for (row=0;row<Nrow;++row) {
            printf("  %5.1f", b[row*Nrow+col]);
            ++ndx;
        }
        printf("\n");
    }

    /* Free memory.                       */
    mxDestroyArray(A);
    mxDestroyArray(B);

    /* Close MATLAB engine.               */
    engClose(ep);

    /* Exit C program.                    */
    printf("Exit myprogram.c\n");
    return EXIT_SUCCESS;
}

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

./myprogram

Copy MATLAB file engopts.sh to the directory from which you intend to submit Engine jobs. Compile myprogram.c:

$ cp /apps/rhel5/MATLAB/R2011b/bin/engopts.sh .
$ mex -f engopts.sh myprogram.c

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a210.rcac.purdue.edu
Enter myprogram.c
>>
B =

    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

    1.0   -1.5    0.5
    1.0   -1.0    0.0
   -2.0    3.5   -0.5
Exit myprogram.c

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB stand-alone program:

MATLAB Implicit Parallelism

MATLAB implements implicit parallelism which, in general, is the exploitation of parallelism that is inherent in many computations, such as matrix multiplication, linear algebra, and performing the same operation on a set of numbers. Implicit parallelism is a form of multithreading which uses hardware to execute efficiently multiple threads. This is different from the explicit parallelism of the Parallel Computing Toolbox. Multithreading aims to increase utilization of a single processor core by using thread-level as well as instruction-level parallelism.

MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. These functions run on the multicore processors of typical Linux clusters. Since these processor cores, or threads, share a common memory, many MATLAB functions contain multithreading potential. Vector operations, the particular application or algorithm, and the amount of computation (array size) contribute to the determination of whether a function runs serially or with multithreading.

When your job triggers implicit parallelism, it attempts to allocate its threads on all processor cores of the compute node on which the MATLAB client is running, including processor cores running other jobs. This competition can degrade the performance of all jobs running on the node. If an affected processor core participates in a larger, distributed-memory, parallel job involving many other nodes, then performance degradation can become much more widespread.

When you know that you are coding a serial job but are unsure whether you are using thread-parallel enabled operations, run MATLAB with implicit parallelism turned off. Beginning with the R2009b, you can turn multithreading off by starting MATLAB with -singleCompThread:

$ matlab -nodisplay -singleCompThread -r mymatlabprogram

When you are using implicit parallelism, request exclusive access to a compute node by requesting all cores which are physically available on a node of a compute cluster:

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 myjob.sub

Parallel Computing Toolbox commands, such as spmd, preempt multithreading. Note that opening a MATLAB pool neither prevents multithreading nor changes the thread count in effect.

For more information about MATLAB's implicit parallelism:

MATLAB Parallel Computing Toolbox (parfor)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. It offers a shared-memory computing environment with a maximum of 12 workers (labs, threads; starting in version R2011a) running on the local cluster profile in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the fine-grained parallelism of a parallel for loop (parfor) in a pool job. Areas of application include for loops with independent iterations.

The following examples illustrate three methods about submitting a small, parallel, MATLAB program with a parallel loop (parfor statement) as a batch, MATLAB pool job to a PBS queue. This MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each iteration of the parfor loop. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the iterations of the parallel loop.

The first method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with a user-defined cluster profile.

The second method uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS configuration which scatters the MATLAB workers onto different compute nodes.

The third method uses the MATLAB compiler mcc and a user-defined Torque cluster profile to compile a MATLAB M-file and submits the compiled file to a PBS queue.

Prepare a MATLAB pool program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;        % get elapsed time in parallel loop
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
    result = strvcat(result,r);

end

Both M-files display the names of all compute nodes which run the job. The parfor statement does not set the values of variables numlabs or labindex, but function matlabpool() can return the pool size. The M-file script uses fprintf() to display the results. The M-file function returns a single value which contains a concatenation of the results.

The execution of a pool job starts with a worker (batch session) executing the statements of the first serial region up to the parfor block, when it pauses. A set of workers (the pool) executes the parfor block. When they finish, the batch session resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

The first method of job submission uses the PBS qsub command to submit a job to a PBS queue, and the function batch(). The batch session distributes the independent iterations of the loop to the workers of the pool. The workers of the pool process simultaneously their respective portions of the workload of the parallel loop so that the parallel loop might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB M-file (MATLAB function batch() accepts either a script M-file or a function M-file).

This method uses the batch() function and either the M-file script myscript.m or the M-file function myfunction.m. In this case the MATLAB client runs on a compute node and uses a user-defined cluster profile.

Prepare a MATLAB script M-file that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m, which specifies the 'mypbsprofile' cluster profile, and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Matlabpool',4,'Profile','mypbsprofile','CaptureDiary',true);
pjob.wait;
pjob.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,walltime=01:00:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m.

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   1    --  00:01 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   1    --  00:01 R 00:00
99026.rossmann-ad myusername      standby  Job1          668   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


mylclbatch.m
rossmann-a000.rcac.purdue.edu
SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a001.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a002.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a001.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a002.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a003.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a003.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a004.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a004.rcac.purdue.edu            4         1          8

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu
Elapsed time in parallel loop:   5.411486

Output shows that the property Matlabpool (4) defined the number of labs in the pool which processed the parallel for loop. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the worker running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

Output shows that processor cores on one compute node (a000) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myscript.m, which includes the two serial regions, while four processor cores processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. Output shows the iterations of the parfor loop in scrambled order since the labs process each iteration independently of the other iterations. Finally, the output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

pjob=batch('myfunction','Matlabpool',4,'Profile','mypbsprofile','CaptureDiary',true);
pjob.wait;
pjob.diary

>> pjob=batch('myfunction',1,{},'Matlabpool',4,'Profile','mypbsprofile');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

pjob=batch(@myfunction,1,{},'Matlabpool',4,'Profile','mypbsprofile');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch().

Specifying a MATLAB pool with 12 labs means a total of 13 workers. This exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

pjob=batch('myscript','Matlabpool',12,'Profile','local','CaptureDiary',true);

$ qsub -l nodes=1:ppn=14,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using batch (line 172)
You requested a minimum of 13 workers but only 12 workers are allowed with the
local scheduler.

Error in mylclbatch (line 6)
pjob=batch('myscript','Matlabpool',12,'Profile','local','CaptureDiary',true);}

The second method uses either a MATLAB script M-file or a MATLAB function M-file, and uses a user-defined cluster profile.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel loop
matlabpool close;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    matlabpool close;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('elapsed time:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> parallel.defaultClusterProfile('mypbsprofile');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00
332028.rossmann-ad myusername      standby  Job1          668   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu

Starting matlabpool using the 'mypbsprofile' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a008.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a008.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          8

Sending a stop signal to all the labs ... stopped.


SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu
Elapsed time in parallel region:   3.382151

Output shows the name of the compute node (a000) that processed the job submission file myjob.sub and the two serial regions. The job submission "scattered" among four different compute nodes (a007,a008,a009,a010) the four compute nodes (four MATLAB labs) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased.

The third method of job submission uses the MATLAB Compiler mcc to compile a MATLAB function M-file with a PBS configuration and submits the compiled file to a PBS queue. This method uses a MATLAB function M-file and a user-defined cluster profile.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements. Proceed with the MATLAB function M-file myfunction.m (when compiling a parfor statement, the parfor must be in a function, not in a script; this is a bug in MATLAB):

% FILENAME:  myscript.m

warning off all;

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel loop
matlabpool close;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    warning off all;

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    matlabpool close;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2012a

On a front end, load modules for MATLAB and GCC. The MATLAB Compiler mcc depends on shared libraries from GCC which is available on Rossmann. Set the default cluster profile to the user-defined cluster configuration and quit MATLAB. Compile both the MATLAB script M-file mywrapper.m and the MATLAB function M-file myfunction.m:

$ module load matlab
$ module load gcc
$ matlab -nodisplay
>> defaultParallelConfig('mypbsprofile');
>> quit
$ mcc -m mywrapper.m myfunction.m
$ mkdir test
$ cp mywrapper test
$ cp run_mywrapper.sh test
$ cp myjob.sub test
$ cd test

To obtain the name of the compute node which runs this compiler-generated script run_mywrapper.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_mywrapper.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with one processor core and request four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

This job runs on a compute node myjob.sub which in turn submits the parallel job. The first job must run at least as long as the job with the parallel loop since it collects the results of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1    --  00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1    --  00:05 R 00:00
115293.rossmann-ad myusername      standby  Job1        29390   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a021.rcac.purdue.edu
run_myfunction.sh
rossmann-a021.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.

SERIAL REGION:  hostname:rossmann-a021.rcac.purdue.edu

Starting matlabpool using the 'mypbsprofile' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a021.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a022.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a023.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a024.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a021.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a022.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a023.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a024.rcac.purdue.edu            4         1          7
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

SERIAL REGION:  hostname:rossmann-a021.rcac.purdue.edu
Elapsed time in parallel loop:   5.125206

Output shows the name of the compute node (a021) that ran the job submission file myjob.sub and the compiler-generated script run_mywrapper.sh, the name of the compute node (a021) that ran the two serial regions, and the names of the four compute nodes (a021,a022,a023,a024) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop. Finally, the output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased. Increase the value of MATLAB_Distrib_Comp_Server in the qsub command to match the new size of the pool.

For more information about MATLAB Parallel Computing Toolbox:

MATLAB Parallel Computing Toolbox (spmd)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. PCT enables task and data parallelism on a multicore processor. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; versions R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the coarse-grained parallelism of a parallel region (spmd) in a pool job. Areas of application include SPMD (single program, multiple data) problems.

This section illustrates how to submit a small, parallel, MATLAB program with a parallel region (spmd statement) as a batch, MATLAB pool job to a PBS queue. The MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each parallel region of the pool. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the parallel regions.

This example uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the spmd statement. This job is completely off the front end.

Prepare a MATLAB script M-file called myscript.m and a MATLAB function M-file myfunction.m with matlabpool statements:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
fprintf('                    hostname                         numlabs  labindex\n')
fprintf('                    -------------------------------  -------  --------\n')
tic;

% PARALLEL REGION
spmd
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel region
matlabpool close;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "r" is a "composite object."
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    r = sprintf('                  hostname                         numlabs  labindex');
    result = strvcat(result,r);
    r = sprintf('                  -------------------------------  -------  --------');
    result = strvcat(result,r);
    tic;

    % PARALLEL REGION
    spmd
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL REGION:  %-31s  %7d  %8d', name,numlabs,labindex);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel region 
    for ndx=1:length(r)          % concatenate composite object "r"
        result = strvcat(result,r{ndx});
    end
    matlabpool close;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('elapsed time:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> parallel.defaultClusterProfile('mypbsprofile');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license and four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00
332028.rossmann-ad myusername      standby  Job1          668   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:rossmann-a001.rcac.purdue.edu

Starting matlabpool using the 'mypbsprofile' profile ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 2:
  PARALLEL REGION:  rossmann-a002.rcac.purdue.edu            4         2
Lab 1:
  PARALLEL REGION:  rossmann-a001.rcac.purdue.edu            4         1
Lab 3:
  PARALLEL REGION:  rossmann-a003.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  rossmann-a004.rcac.purdue.edu            4         4

Sending a stop signal to all the labs ... stopped.


SERIAL REGION:  hostname:rossmann-a001.rcac.purdue.edu
Elapsed time in parallel region:   3.382151

Output shows the name of one compute node (a001) that processed the job submission file myjob.sub and the two serial regions. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a001,a002,a003,a004) that processed the four parallel regions. The total elapsed time demonstrates that the jobs ran in parallel.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about MATLAB Parallel Computing Toolbox:

MATLAB Distributed Computing Server (parallel job)

The MATLAB Parallel Computing Toolbox (PCT) offers a parallel job via the MATLAB Distributed Computing Server (DCS). The tasks of a parallel job are identical, run simultaneously on several MATLAB workers (labs), and communicate with each other. This section illustrates an MPI-like program. Areas of application include distributed arrays and message passing.

This section illustrates how to submit a small, MATLAB parallel job with four workers running one MPI-like task to a PBS queue. The MATLAB program broadcasts an integer, which might be the number of slices of a numerical integration, to four workers and gathers the names of the compute nodes running the workers and the lab IDs of the workers. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the program.

This example uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the parallel job. This job is completely off the front end.

Prepare a MATLAB script M-file myscript.m and a MATLAB function M-file myfunction.m.

% FILENAME:  myscript.m

% Specify pool size.
% Convert the parallel job to a pool job.
matlabpool open 4;
spmd


if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if labindex == 1
    disp(result)
end


end   % spmd
matlabpool close force;
quit;

% FILENAME:  myfunction.m


function result = myfunction ()

    result = 0;

    % Specify pool size.
    % Convert the parallel job to a pool job.
    matlabpool open 4;
    spmd

    if labindex == 1
        % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
        N = labBroadcast(1,int64(1000));
    else
        % Each lab (rank) receives the broadcast value from lab (rank) #1.
        N = labBroadcast(1);
    end

    % Form a string with host name, total number of labs, lab ID, and broadcast value.
    [c name] =system('hostname');
    name = name(1:length(name)-1);
    fmt = num2str(floor(log10(numlabs))+1);
    str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

    % Apply global concatenate to all str's.
    % Store the concatenation of str's in the first dimension (row) and on lab #1.
    rslt = gcat(str,1,1);

    end   % spmd
    result = rslt{1};
    matlabpool close force;

end   % function

Also, prepare a job submission, here named myjob.sub. Run with the name of either the script M-file or the function M-file::

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu:
                                                    Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
465534.hansen-a kes      standby  myjob.sub    5620   1   1    --  00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu:
                                                    Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
465534.hansen-a kes      standby  myjob.sub    5620   1   1    --  00:05 R 00:00
465545.hansen-a kes      standby  Job2          --    4   4    --  00:01 R   -- 

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

myjob.sub
rossmann-a006.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
Lab 1:
  rossmann-a006.rcac.purdue.edu:4:1:1000
  rossmann-a007.rcac.purdue.edu:4:2:1000
  rossmann-a008.rcac.purdue.edu:4:3:1000
  rossmann-a009.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of one compute node (a006) that processed the job submission file myjob.sub. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a006,a007,a008,a009) that processed the four parallel regions. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each parallel region the name of the compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

For more information about parallel jobs:

Octave (Interpreting an M-file)

GNU Octave is a high-level, interpreted, programming language for numerical computations. The Octave interpreter is the part of Octave which reads M-files, oct-files, and MEX-files and executes Octave statements. Octave is a structured language (similar to C) and mostly compatible with MATLAB. You may use Octave to avoid the need for a MATLAB license, both during development and as a deployed application. By doing so, you may be able to run your application on more systems or more easily distribute it to others.

This section illustrates how to submit a small Octave job to a PBS queue. This Octave example computes the inverse of a matrix.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

The command octave myjob.m (without the redirection) also works in the preceding script.

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q << EOF
octave << EOF

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit
EOF     % end of Octave commands

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   2   3
   4   5   6
   7   8   0

ans =

  -1.77778   0.88889  -0.11111
   1.55556  -0.77778   0.22222
  -0.11111   0.22222  -0.11111

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Octave:

Octave Compiler (Compiling an M-file)

Octave does not offer a compiler to translate an M-file into an executable file for additional speed or distribution. You may wish to consider recoding an M-file as either an oct-file or a stand-alone program.

Octave Executable (Oct-file)

An oct-file is an "Octave Executable". It offers a way for Octave code to call functions written in C, C++, or Fortran as though these external functions were built-in Octave functions. You may wish to use an oct-file if you would like to call an existing C, C++, or Fortran function directly from Octave rather than reimplementing that code as an Octave function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than Octave, you may be able to substantially improve performance over Octave source code, especially for statements like for and while.

This section illustrates how to submit a small Octave job with an oct-file to a PBS queue. This Octave example calls a C function which adds two matrices.

Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with an oct-file, which contains the necessary external function interface of Octave. The name of the file is matrixSum.cc:

 * FILENAME:  matrixSum.cc
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum (inMatrix, inMatrix, outMatrix, size)
 *
 * This is an oct-file for Octave.
 *
 **********************************************************/

#include <octave/oct.h>

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
DEFUN_DLD (matrixSum, args, nargout, "matrixSum: A + B") {

    NDArray inMatrix_a;                /* mxn input matrix   */
    NDArray inMatrix_b;                /* mxn input matrix   */
    int nrows_a,ncols_a;               /* size of matrix a   */
    int nrows_b,ncols_b;               /* size of matrix b   */
    NDArray outMatrix_c;               /* mxn output matrix  */

    /* Check for proper number of input arguments */
    if (args.length() != 2) {
       printf("matrixSum:  two inputs required.");
       exit(-1);
    }
    /* Check for proper number of output arguments */
    if (nargout != 1) {
       printf("matrixSum:  one output required.");
       exit(-1);
    }

    /* Check that both input matrices are real matrices. */
    if (!args(0).is_real_matrix()) {
       printf("matrixSum:  expecting LHS (arg 1) to be a real matrix");
       exit(-1);
    }
    if (!args(1).is_real_matrix()) {
       printf("matrixSum:  expecting RHS (arg 2) to be a real matrix");
       exit(-1);
    }

    /* Get dimensions of the first input matrix */
    nrows_a = args(0).rows();
    ncols_a = args(0).columns();
    /* Get dimensions of the second input matrix */
    nrows_b = args(1).rows();
    ncols_b = args(1).columns();

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
       printf("matrixSum:  unequal number of rows.");
       exit(-1);
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
       printf("matrixSum:  unequal number of rows.");
       exit(-1);
    }

    /* Make a pointer to the real data in the first input matrix  */
    inMatrix_a = args(0).array_value();
    /* Make a pointer to the real data in the second input matrix  */
    inMatrix_b = args(1).array_value();

    /* Construct output matrix as a copy of the first input matrix. */
    outMatrix_c = args(0).array_value();

    /* Call the computational routine.  */
    double* ptr_a = inMatrix_a.fortran_vec();
    double* ptr_b = inMatrix_b.fortran_vec();
    double* ptr_c = outMatrix_c.fortran_vec(); 
    matrixSum(ptr_a,ptr_b,ptr_c,nrows_a*ncols_a);

    return octave_value(outMatrix_c);
}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile matrixSum.cc into an oct-file:

$ mkoctfile matrixSum.cc

Two new files appear after the compilation:

matrixSum.o
matrixSum.oct

The name of the Octave-callable oct-file is matrixSum.oct.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Call the separately compiled and dynamically linked oct-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   1   1
   1   1   1

B =

   2   2   2
   2   2   2

C =

   3   3   3
   3   3   3

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave oct-file:

Octave Standalone Program

A stand-alone Octave program is a C, C++, or Fortran program which calls user-written oct-files and the same libraries that Octave uses. A stand-alone program has access to Octave objects, such as the array and matrix classes, as well as all the Octave algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select Octave functions, a stand-alone Octave program may be a good option. This offers the possibility for substantially improved performance over Octave source code, especially for statements like for and while while still allowing use of specialized Octave functions where useful.

This section illustrates how to submit a small, stand-alone Octave program to a PBS queue. This C++ example uses class Matrix and calls an Octave script which prints a message.

Prepare an Octave-compatible M-file with an appropriate filename, here named hello.m:

% FILENAME:  hello.m

disp('hello.m:    hello, world')

Prepare a C++ function file with the necessary external function interface and with an appropriate filename, here named hello.cc:

// FILENAME:  hello.cc

#include <iostream>
#include <octave/oct.h>
#include <octave/octave.h>
#include <octave/parse.h>
#include <octave/toplev.h> /* do_octave_atexit */

int main (const int argc, char ** argv) {

    const char * argvv [] = {"" /* name of program, not relevant */, "--silent"};
    octave_main (2, (char **) argvv, true /* embedded */);

    std::cout << "hello.cc:   hello, world" << std::endl;

    const octave_value_list result = feval ("hello");  /* invoke hello.m */

    int n = 2;
    Matrix a_matrix = Matrix (1,2);
    a_matrix (0,0) = 888;
    a_matrix (0,1) = 999;
    std::cout << "hello.cc:   " << a_matrix;

    do_octave_atexit ();

}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile the stand-alone Octave program:

$ mkoctfile --link-stand-alone hello.cc -o hello

Two new files appear after the compilation:

hello
hello.o

The name of the compiled, stand-alone Octave program is hello.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load gcc
cd $PBS_O_WORKDIR
unset DISPLAY

hello

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello.cc:   hello, world
hello.m:    hello, world
hello.cc:    888 999

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave stand-alone program:

Octave (MEX-file)

MEX stands for "MATLAB Executable". A MEX-file offers a way for MATLAB code to call functions written in C, C++ or Fortran as though these external functions were built-in MATLAB functions. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while.

Octave includes an interface which can link compiled, legacy MEX-files. This interface allows sharing code between Octave and MATLAB users. In Octave, an oct-file will always perform better than a MEX-file, so you should write new code using the oct-file interface, if possible. However, you may test a new MEX-file in Octave then use it in a MATLAB application.

This section illustrates how to submit a small Octave job with a MEX-file to a PBS queue. This Octave example calls a C function which adds two matrices.

Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. The name of the file is matrixSum.c:

/*************************************************************
 * FILENAME:  matrixSum.c
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum(inMatrix, inMatrix, outMatrix, size)
 *
 * This is a MEX-file which Octave will execute.
 *
 **************************************************************/

#include "mex.h"

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, mwSize n) {
    mwSize i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
void mexFunction (int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[]) {

    double *inMatrix_a;               /* mxn input matrix  */
    double *inMatrix_b;               /* mxn input matrix  */
    mwSize nrows_a,ncols_a;           /* size of matrix a  */
    mwSize nrows_b,ncols_b;           /* size of matrix b  */
    double *outMatrix_c;              /* mxn output matrix */

    /* Check for proper number of arguments */
    if(nrhs!=2) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
    }
    if(nlhs!=1) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
    }

    /* Get dimensions of the first input matrix */
    nrows_a = mxGetM(prhs[0]);
    ncols_a = mxGetN(prhs[0]);
    /* Get dimensions of the second input matrix */
    nrows_b = mxGetM(prhs[1]);
    ncols_b = mxGetN(prhs[1]);

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
    }

    /* Make a pointer to the real data in the first input matrix  */
    inMatrix_a = mxGetPr(prhs[0]);
    /* Make a pointer to the real data in the second input matrix  */
    inMatrix_b = mxGetPr(prhs[1]);

    /* Make the output matrix */
    plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);

    /* Make a pointer to the real data in the output matrix */
    outMatrix_c = mxGetPr(plhs[0]);

    /* Call the computational routine */
    matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile matrixSum.c into a MEX-file:

$ mkoctfile --mex matrixSum.c

Two new files appear after the compilation:

matrixSum.mex
matrixSum.o

The name of the Octave-callable MEX-file is matrixSum.mex.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Call the separately compiled and dynamically linked oct-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   1   1
   1   1   1

B =

   2   2   2
   2   2   2

C =

   3   3   3
   3   3   3

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave-compatible Mex-file:

Perl

Perl is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Perl job to a PBS queue. This Perl example prints a single line of text.

Prepare a Perl input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

print "hello, world\n"

Discover the absolute path of Perl:

$ which perl
/usr/local/bin/perl

There is a second absolute path: /usr/bin/perl.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -w option to issue warnings.
/usr/bin/perl -w myjob.in

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Perl:

Python

Python is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Python job to a PBS queue. This Python example prints a single line of text.

Prepare a Python input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

import string, sys
print "hello, world"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load python
cd $PBS_O_WORKDIR
unset DISPLAY

python myjob.in

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

If you would like to install a python package for your own personal use, you may do so by following these directions. Make sure you have a download link to the software you want to use and substitute it on the wget line.

$ mkdir ~/src
$ cd ~/src
$ wget http://path/to/source/tarball/app-1.0.tar.gz
$ tar xzvf app-1.0.tar.gz
$ cd app-1.0
$ module load python
$ python setup.py install --user
$ cd ~
$ python
>>> import app
>>> quit()

The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

For more information about Python:

R

R, a GNU project, is a language and environment for statistics and graphics. It is an open source version of the S programming language. This section illustrates how to submit a small R job to a PBS queue. This R example computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load r
cd $PBS_O_WORKDIR

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load r

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save << EOF

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result
EOF

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # FILENAME:  myjob.in
>
> # Compute a Pythagorean triple.
> a = 3
> b = 4
> c = sqrt(a*a + b*b)
> c     # display result
[1] 5
>

Any output written to standard error will appear in myjob.sub.emyjobid.

To install additional R packages, create a folder in your home directory called Rlibs. You will need to be running a recent version of R (2.14.0 or greater as of this writing):

$ mkdir ~/Rlibs

If you are running the bash shell (the default on our clusters), add the following line to your .bashrc (Create the file ~/.bashrc if it doesn't already exist. You may also need to run "ln -s .bashrc .bash_profile" if .bash_profile doesn't exist either):

export R_LIBS=~/Rlibs:$R_LIBS

If you are running csh or tcsh, add the following to your .cshrc:

setenv R_LIBS ~/Rlibs:$R_LIBS

Now run "source .bashrc" and start R:

$ module load r 
$ R
> .libPaths()
[1] "/home/myusername/Rlibs"        
[2] "/apps/rhel5/R-2.14.0/lib64/R/library"

.libPaths() should output something similar to above if it is set up correctly. Now let's try installing a package.

> install.packages('packagename',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN")

The above command should download and install the requested R package, which upon completion can then be loaded.

> library('packagename')

If your R package relies on a library that's only installed as a module (for this example we'll use GDAL), you can install it by doing the following:

$ module load gdal
$ module load r
$ R
> install.packages('rgdal',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN", configure.args="--with-gdal-include=$GDAL_HOME/include
--with-gdal-lib=$GDAL_HOME/lib"))

Repeat install.packages(...) for any packages that you need. Your R packages should now be installed.

For more information about R:

SAS

SAS (pronounced "sass") is an integrated system supporting statistical analysis, report generation, business planning, and forecasting. This section illustrates how to submit a small SAS job to a PBS queue. This SAS example displays a small dataset.

Prepare a SAS input file with an appropriate filename, here named myjob.sas:

* FILENAME:  myjob.sas

/* Display a small dataset. */
TITLE 'Display a Small Dataset';
DATA grades;
INPUT name $ midterm final;
DATALINES;
Anne     61 64
Bob      71 71
Carla    86 80
David    79 77
Edwardo  73 73
Fannie   81 81
;
PROC PRINT data=grades;
RUN;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load sas
cd $PBS_O_WORKDIR

# -stdio:   run SAS in batch mode:
#              read SAS input from stdin
#              write SAS output to stdout
#              write SAS log to stderr
# -nonews:  do not display SAS news
# SAS runs in batch mode when the name of the SAS command file
# appears as a command-line argument.
sas -stdio -nonews myjob

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

                                                           The SAS System                       10:59 Wednesday, January 5, 2011   1

                                                 Obs    name       midterm    final

                                                  1     Anne          61        64
                                                  2     Bob           71        71
                                                  3     Carla         86        80
                                                  4     David         79        77
                                                  5     Edwardo       73        73
                                                  6     Fannie        81        81

View the SAS log in the standard error file, myjob.sub.emyjobid:

1                                                          The SAS System                           12:32 Saturday, January 29, 2011

NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.2 (TS2M0)
      Licensed to PURDUE UNIVERSITY - T&R, Site 70063312.
NOTE: This session is executing on the Linux 2.6.18-194.17.1.el5rcac2 (LINUX) platform.



NOTE: SAS initialization used:
      real time           0.70 seconds
      cpu time            0.03 seconds

1          * FILENAME:  myjob.sas
2
3          /* Display a small dataset. */
4          TITLE 'Display a Small Dataset';
5          DATA grades;
6          INPUT name $ midterm final;
7          DATALINES;

NOTE: The data set WORK.GRADES has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.18 seconds
      cpu time            0.01 seconds


14         ;
15         PROC PRINT data=grades;
16         RUN;

NOTE: There were 6 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.32 seconds
      cpu time            0.04 seconds


NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           1.28 seconds
      cpu time            0.08 seconds

For more information about SAS:

Running Jobs via HTCondor

HTCondor allows you to run jobs on systems which would otherwise be idle for however long their primary users do not need those systems. HTCondor is one of several distributed computing systems which ITaP makes available. Most ITaP research resources, in addition to being available through normal means, are a part of BoilerGrid and are accessible via HTCondor. If a primary user needs a processor core on a compute node, HTCondor immediately either checkpoints and/or migrates all HTCondor jobs on that compute node and makes that resource available to the primary user. Thus, shorter jobs will have a better completion rate via HTCondor than longer jobs; however, even though HTCondor may have to restart jobs elsewhere, BoilerGrid can offer a vast amount of computational resources to users. Not only are nearly all ITaP research systems part of BoilerGrid, so also are large numbers of lab machines at the West Lafayette and other Purdue campuses. BoilerGrid is one of the largest HTCondor pools in the world. Some machines at other institutions are also a part of a larger HTCondor federation known as DiaGrid and are available as well.

For more information:

Rossmann Frequently Asked Questions (FAQ)

There are currently no FAQs for Rossmann.