Rossmann - Complete User Guide

Conventions Used in this Document

This document follows certain typesetting and naming conventions:

  • Colored, underlined text indicates a link.
  • Colored, bold text highlights something of particular importance.
  • Italicized text notes the first use of a key concept or term.
  • Bold, fixed-width font text indicates a command or command argument that you type verbatim.
  • Examples of commands and output as you would see them on the command line will appear in colored blocks of fixed-width text such as this:
    $ example
    This is an example of commands and output.
    
  • All command line shell prompts appear as a single dollar sign ("$"). Your actual shell prompt may differ.
  • All examples work with bash or ksh shells. Where different, changes needed for tcsh or csh shell users appear in example comments.
  • All names that begin with "my" illustrate examples that you replace with an appropriate name. These include "myusername", "myfilename", "mydirectory", "myjobid", etc.
  • The term "processor core" or "core" throughout this guide refers to the individual CPU cores on a processor chip. All ITaP research systems schedule jobs on the basis of these processor cores, and not the physical processor chips. For example, no distinction would be made between a dual-processor, single-core machine and a single-processor, dual-core machine, as both contain a total of two processor cores.
  • The term "compute node" is a synonym for processor chip.

Overview of Rossmann

Rossmann is a compute cluster operated by ITaP, and is a member of Purdue's Community Cluster Program. Rossmann went into production on September 1, 2010. It consists of HP (Hewlett Packard) ProLiant DL165 G7 nodes with 64-bit, dual 12-core AMD Opteron 6172 processors (24 cores per node), either 48 GB or 96 GB of memory, and 250 GB of local disk for system software and local scratch storage. Nodes with 192 GB of memory and either 1 TB or 2 TB of local scratch disk are also available. All nodes have 10 Gigabit Ethernet interconnects.

Namesake

Rossmann is named in honor of Michael Rossmann, Purdue's Hanley Distinguished Professor of Biological Sciences. More information about his life and impact on Purdue is available in an ITaP Biography of Michael Rossmann.

Detailed Hardware Specification

Rossmann consists of five logical sub-clusters, each with a different memory/storage configuration. All nodes in the cluster have dual 12-core AMD Opteron 6172 processors and 10 Gigabit Ethernet (10GigE).

Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect Disk
Rossmann-A 393 Two 2.1 GHz 12-Core AMD 6172 24 48 GB 10 GigE 250 GB
Rossmann-B 40 Two 2.1 GHz 12-Core AMD 6172 24 96 GB 10 GigE 250 GB
Rossmann-C 2 Two 2.1 GHz 12-Core AMD 6172 24 192 GB 10 GigE 1 TB
Rossmann-D 4 Two 2.1 GHz 12-Core AMD 6172 24 192 GB 10 GigE 2 TB
Rossmann-H 11 Two 2.1 GHz 12-Core AMD 6172 24 48 GB 10 GigE 8 TB

Rossmann nodes run Red Hat Enterprise Linux 5 (RHEL5) and use Moab Workload Manager 6 and TORQUE Resource Manager 3 as the portable batch system (PBS) for resource and job management. Rossmann also runs jobs for BoilerGrid whenever processor cores in it would otherwise be idle. The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).

For more information about the TORQUE Resource Manager:

On Rossmann, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • PGI 11.8
  • ACML
  • OpenMPI 1.4.4

To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

Node Interconnect Systems

The system interconnect is the networking technology that connects nodes of a cluster to each other. This is often much faster and sometimes radically different from the networking available between a resource and other machines or the outside world. Interconnects have different characteristics that may affect parallel message-passing programs and their design. Each ITaP research resource has different interconnect options available, and some have more than one available to all or only portions of the resource's nodes. For information on which interconnects are available, refer to the hardware specification for the resource above. Details about the specific interconnects available on Rossmann follow.

10 Gigabit Ethernet

Ten Gigabit Ethernet (10GigE) is a form of Ethernet, currently the most widely used network link technology, that is able to transfer data at rates of approximately ten Gigabits per second—hundred times faster than 100 Mbps Ethernet. Consequently, 10GigE cable runs must be much shorter as well.

Accounts on Rossmann

Purchasing Nodes

Information Technology at Purdue (ITaP) operates a significant shared cluster computing infrastructure developed over several years through focused acquisitions using funds from grants, faculty startup packages, and institutional sources. These "community clusters" are now at the foundation of Purdue's research cyberinfrastructure.

We strongly encourage any Purdue faculty or staff with computational needs to join this growing community and enjoy the enormous benefits this shared infrastructure provides:

  • Peace of Mind
    ITaP system administrators take care of security patches, attempted hacks, operating system upgrades, and hardware repair so faculty and graduate students can concentrate on research.
  • Low Overhead
    ITaP data centers provide infrastructure such as networking, racks, floor space, cooling, and power.
  • Cost Effective
    ITaP works with vendors to obtain the best price for computing resources by pooling funds from different disciplines to leverage greater group purchasing power.

Through the Community Cluster Program, Purdue affiliates have invested several million dollars in computational and storage resources from Q4 2006 to the present with great success in both the research accomplished and the money saved on equipment purchases.

ITaP is currently offering purchase of access to Carter. To purchase access to Carter today, go to the Carter Access Purchase page. For updates on the ITaP's community cluster program, please subscribe to the mailing list of the Community Cluster Program.

Cluster Partner Services

In addition to priority access to a number of processor cores, partners in our Community Cluster Program may also take advantage of additional services offered to them free of charge. These include:

  • Unix Group
    Restrict access to files or programs by using Unix file permissions on the basis of those you approve for access to your queues.
  • Application Storage
    Store your custom application binaries in central storage that is backed-up and available from all clusters, but not part of your personal home directory.
  • Subversion (SVN) Repository
    Store and manage your code or documents through a centrally-supported, professional-grade, revision control system.

To request any of these be created for your research group, or for more information, please email rcac-help@purdue.edu.

Obtaining an Account

To obtain an account, you must be part of a research group which has purchased access to Rossmann. Refer to the Accounts / Access page for more details on how to request access.

Login / SSH

To submit jobs on Rossmann, log in to the submission host rossmann.rcac.purdue.edu via SSH. This submission host is actually four front-end hosts: rossmann-fe00, rossmann-fe01, rossmann-fe02, and rossmann-fe03. The login process randomly assigns one of these four front-ends to each login to rossmann.rcac.purdue.edu. While the four front-end hosts are identical, each has its own /tmp. Sharing data in /tmp during subsequent sessions may fail. ITaP advises using scratch storage for multisession, shared data instead.

SSH Client Software

Secure Shell or SSH is a way of establishing a secure (encrypted) connection between two computers. It uses public-key cryptography to authenticate the remote computer and (optionally) to allow the remote computer to authenticate the user. Its usual function involves logging in to a remote machine and executing commands, but it also supports tunneling and forwarding of X11 or arbitrary TCP connections. There are many SSH clients available for all operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • The ssh command is pre-installed. Log in using ssh myusername@servername.

Microsoft Windows:

  • PuTTY is an extremely small download of a free, full-featured SSH client.
  • Secure CRT is a commercial SSH client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The ssh command is pre-installed. You may start a local terminal window from "Applications->Utilities". Log in using ssh myusername@servername.
  • MacSSH is another free SSH client.

SSH Keys

SSH works with many different means of authentication. One popular authentication method is Public Key Authentication (PKA). PKA is a method of establishing your identity to a remote computer using related sets of encryption data called keys. PKA is a more secure alternative to traditional password-based authentication with which you are probably familiar.

To employ PKA via SSH, you manually generate a keypair (also called SSH keys) in the location from where you wish to initiate a connection to a remote machine. This keypair consists of two text files: private key and public key. You keep the private key file confidential on your local machine or local home directory (hence the name "private" key). You then log in to a remote machine (if possible) and append the corresponding public key text to the end of a specific file, or have a system administrator do so on your behalf. In future login attempts, PKA compares the public and private keys to verify your identity; only then do you have access to the remote machine.

As a user, you can create, maintain, and employ as many keypairs as you wish. If you connect to a computational resource from your work laptop, your work desktop, and your home desktop, you can create and employ keypairs on each. You can also create multiple keypairs on a single local machine to serve different purposes, such as establishing access to different remote machines or establishing different types of access to a single remote machine. In short, PKA via SSH offers a secure but flexible means of identifying yourself to all kinds of computational resources.

Passphrases and SSH Keys

Creating a keypair prompts you to provide a passphrase for the private key. This passphrase is different from a password in a number of ways. First, a passphrase is, as the name implies, a phrase. It can include most types of characters, including spaces, and has no limits on length. Secondly, the remote machine does not receive this passphrase for verification. Its purpose is only to allow the use of your local private key and is specific to a specific local private key.

Perhaps you are wondering why you would need a private key passphrase at all when using PKA. If the private key remains secure, why the need for a passphrase just to use it? Indeed, if the location of your private keys were always completely secure, a passphrase might not be necessary. In reality, a number of situations could arise in which someone may improperly gain access to your private key files. In these situations, a passphrase offers another level of security for you, the user who created the keypair.

Think of the private key/passphrase combination as being analogous to your ATM card/PIN combination. The ATM card itself is the object that grants access to your important accounts, and as such, should remain secure at all times—just as a private key should. But if you ever lose your wallet or someone steals your ATM card, you are glad that your PIN exists to offer another level of protection. The same is true for a private key passphrase.

When you create a keypair, you should always provide a corresponding private key passphrase. For security purposes, avoid using phrases which automated programs can discover (e.g. phrases that consist solely of words in English-language dictionaries). This passphrase is not recoverable if forgotten, so make note of it. Only a few situations warrant using a non-passphrase-protected private key—conducting automated file backups is one such situation. If you need to use a non-passphrase-protected private key to conduct automated backups to Fortress, see the No-Passphrase SSH Keys section.

SSH X11 Forwarding

SSH supports tunneling of X11 (X-Windows). If you have an X11 server running on your local machine, you may use X11 applications on remote systems and have their graphical displays appear on your local machine. These X11 connections are tunneled and encrypted automatically by your SSH client.

Installing an X11 Server

To use X11, you will need to have a local X11 server running on your personal machine. Both free and commercial X11 servers are available for various operating systems.

Linux / Solaris / AIX / HP-UX / Unix:

  • An X11 server is at the core of all graphical sessions. If you are logged in to a graphical environment on these operating systems, you are already running an X11 server.

Microsoft Windows:

  • Xming is a free X11 server available for all versions of Windows, although it may occasionally hang and require a restart. Download the "Public Domain Xming" or donate to the development for the newest version.
  • Hummingbird eXceed is a commercial X11 server available for all versions of Windows.
  • Cygwin is another free X11 server available for all versions of Windows. Download and run setup.exe. During installation, you must select the following packages which are not included by default:
    • X-startup-scripts
    • XFree86-lib-compat
    • xorg-*
    • xterm
    • xwinwm
    • lib-glitz-glx1
    • opengl (if you also want OpenGL support, under the Graphics group)
    Once you are running the Cygwin X server, start an xterm, type XWin -multiwindow in it, and then press enter. You may now run your SSH client.

Mac OS X:

  • X11 is available as an optional install on the Mac OS X v10.3 Panther and x10.4 Tiger install disks. Run the installer, select the X11 option, and follow the instructions.

Enabling X11 Forwarding in your SSH Client

Once you are running an X11 server, you will need to enable X11 forwarding/tunneling in your SSH client:

  • "ssh": X11 tunneling should be enabled by default. To be certain it is enabled, you may use ssh -Y.
  • PuTTY: Prior to connection, in your connection's options, under "Tunnels", check "Enable X11 forwarding", and save your connection.
  • Secure CRT: Right-click a saved connection, and select "Properties". Expand the "Connection" settings, then go to "Port Forwarding" -> "Remote/X11". Check "Forward X11 packets" and click "OK".

SSH will set the remote environment variable $DISPLAY to "localhost:XX.YY" when this is working correctly. If you had previously set your $DISPLAY environment variable to your local IP or hostname, you must remove any set/export/setenv of this variable from your login scripts. The environment variable $DISPLAY must be left as SSH sets it, which is to a random local port address. Setting $DISPLAY to an IP or hostname will not work.

Passwords

If you have received a default password as part of the process of obtaining your account, you should change it immediately when you log on for the first time. Change your password from any terminal/SSH session with the command passwd. You will have the same password on all ITaP systems. If you change your password on any one ITaP system, it will change on all ITaP systems.

If you already have a Purdue career account, then you will initially receive the same username and password as your career account. There is no need to change your career account password because you have received an account on ITaP systems.

There is not currently any requirement regarding how often you must change your password for ITaP research systems, but for security reasons changing a password every six months, preferably every three months, is good practice, and other systems on campus linked to your career account do require this.

A password should employ all of the following features:

  • Something you have never used as a password before, on this or any other system.
  • Easy for you to remember and difficult for others to guess.
  • At least eight characters long.
  • A combination of uppercase and lowercase letters, numbers, and symbols.
TIP: A recommended password is an abbreviation of a sentence or song lyric: "The dog Samson ate 4 new slippers!" = "TdSa4ns!"

Never share your password with another user or make your password known to anyone else. Systems staff will NEVER ask for your password, by email or otherwise.

Email

There is no local email delivery available on Rossmann. Rossmann forwards all email which it receives to mail.rcac.purdue.edu for delivery.

Login Shell

Your shell is the program that generates your command-line prompt and processes commands. On ITaP research systems, several common shell choices are available:

Name Description Path
bash A Bourne-shell (sh) compatible shell with many newer advanced features as well. Bash is one of the most common shells in use today. /bin/bash
tcsh An advanced variant on csh with all the features of modern shells. Tcsh is probably the second most popular shell in use today. /bin/tcsh
zsh An advanced shell which incorprates all the functionality of bash, tcsh, and ksh combined, usually with identical syntax. In spite of this, zsh is not in common use. /bin/zsh
csh The original C-style shell. Because tcsh offers all the functionality of csh and more, use csh only when you have specific csh-only scripts. /bin/csh
ksh Korn shell, which was an early Bourne-shell compatible shell with some additional features. Unless you are already an adept ksh user, you would probably prefer bash. /bin/ksh

To find out what shell you are running right now, simply use the ps command:

$ ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30273 pts/27   00:00:00 ps

To use a different shell on a one-time or trial basis, simply type the shell name as a command. To return to your original shell, type exit:

$ ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30273 pts/27   00:00:00 ps

$ tcsh
% ps
  PID TTY          TIME CMD
30181 pts/27   00:00:00 bash
30313 pts/27   00:00:00 tcsh
30315 pts/27   00:00:00 ps

% exit
$

To permanently change your default login shell, use the command chsh:

$ chsh

Changing login shell for myusername on *all* ACMAINT hosts.
Enter existing password: **********
Old shell: nologin
New shell [nologin]: /bin/tcsh

Changed 'loginShell' to '/bin/tcsh' for login 'myusername' on host(s) 'host123.rcac.purdue.edu host234.rcac.purdue.edu ...'.
Connection to data.rcac.purdue.edu closed.

There is a propagation delay which may last up to two hours. After the change has taken effect, your next login will start in your new shell. Moreover, you may change your shell again at any time by rerunning chsh.

File Storage and Transfer for Rossmann

Storage Options

File storage options on ITaP research systems include long-term storage (home directories, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. ITaP provides daily snapshots of home directories for a limited time for accidental deletion recovery. ITaP does not back up short-term storage and regularly purges old files from scratch and /tmp directories. More details about each storage option appear below.

Home Directories

ITaP provides home directories for long-term file storage. Each user ID has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

ITaP provides daily snapshots of your home directory for a limited period of time in the event of accidental deletion. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive.

Your home directory physically resides within the Isilon storage system at Purdue. To find the path to your home directory, first log in then immediately enter the following:

$ pwd
/home/myusername

Or from any subdirectory:

$ echo $HOME
/home/myusername

Your home directory and its contents are available on all ITaP research front-end hosts and compute nodes via the Network File System (NFS).

Your home directory has a quota capping the size and/or number of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

Lost Home Directory File Recovery

Only files which have been snap-shotted overnight are recoverable. If you lose a file the same day you created it, it is NOT recoverable.

To recover files lost from your home directory, use the flost command:

$ flost

Scratch Directories

ITaP provides scratch directories for short-term file storage only. Each file system domain has at least one scratch directory. Each user ID may access one scratch directory in a file system domain. The quota of your scratch directory is several times greater than the quota of your home directory. You should use your scratch directory for storing large temporary input files which your job reads or for writing large temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results.

Files in scratch directories are not recoverable. ITaP does not backup files in scratch directories. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

ITaP automatically removes (purges) from scratch directories all files stored for more than 90 days. Owners of these files receive a notice one week before removal via email. For more information, please refer to our Scratch File Purging Policy.

All users may access scratch directories on Rossmann. To find the path to your scratch directory:

$ findscratch
/scratch/lustreA/m/myusername

The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

$ echo $RCAC_SCRATCH
/scratch/lustreA/m/myusername

Your scratch directory on Rossmann may be same location and shared by some other ITaP research resources, and also distinct and not shared by other ITaP research resources. All front-end/login nodes on all computational resources are able to access the scratch directories of all other computational resources. However, compute nodes are only able to access the scratch directory allocated to that specific computational resource. ITaP may change which computational resources share scratch storage with which other computational resources as needs dictate. For more information about which computational resources share scratch volumes, please see the section Network Storage.

To find the path to someone else's scratch directory:

$ findscratch someusername
/scratch/lustreA/s/someusername

Your scratch directory has a quota capping the size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits .

/tmp Directory

ITaP provides /tmp directories for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

ITaP does not perform backups for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

Long-Term Storage

Long-term Storage or Permanent Storage is available to ITaP research users on the High Performance Storage System (HPSS), an archival storage system, commonly referred to as "Fortress". HPSS is a software package that manages a hierarchical storage system. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has a 1.2 PB capacity.

Files smaller than 100 MB have their primary copy stored on low-cost disks (disk cache), but the second copy (backup of disk cache) is on tape or optical disks. This provides a rapid restore time to the disk cache. However, the large latency to access a larger file (usually involving a copy from a tape cartridge) makes it unsuitable for direct use by any processes or jobs, even where possible. The primary and secondary copies of larger files are stored on separate tape cartridges in the Quantum (ADIC, Advanced Digital Information Corporation) tape library.

To ensure optimal performance for all users, and to keep the Fortress system healthy, please remember the following tips:

  • Fortress operates most effectively with large files - 1GB or larger. If your data is comprised of smaller files, use HTAR to directly create archives in Fortress.
  • When working with files on cluster head nodes, use your home directory or a scratch file system, rather than editing or computing on files directly in Fortress. Copy any data you wish to archive to Fortress after computation is complete.
  • The HPSS software does not handle sparse files (files with empty space) in an optimal manner. Therefore, if you must copy a sparse file into HPSS, use HSI rather than the cp or mv commands.
  • Due to the sparse files issue, the rsync command should not be used to copy data into Fortress through NFS, as this may cause problems with the system.

Fortress writes two copies of every file either to two tapes, or to disk and a tape, to protect against medium errors. Unfortunately, Fortress does not automatically switch to the alternate copy when it has trouble accessing the primary. If it seems to be taking an extraordinary amount of time to retrieve a file (hours), please either email rcac-help@purdue.edu or call ITaP Customer Service at 765-49-4400. We can then investigate why it is taking so long. If it is an error on the primary copy, we will instruct Fortress to switch to the alternate copy as the primary and recreate a new alternate copy.

For more information about Fortress, how it works, user guides, and how to obtain an account:

Manual File Transfer to Long-Term Storage

There are a variety of ways to manually transfer files to your Fortress home directory for long-term storage.

HSI

HSI, the Hierarchical Storage Interface, is the preferred method of transferring files to and from Fortress. HSI is designed to be a friendly interface for users of the High Performance Storage System (HPSS). It provides a familiar Unix-style environment for working within HPSS while automatically taking advantage of high-speed, parallel file transfers without requiring any special user knowledge.

HSI is already provided on all ITaP research systems as the command hsi. You may download HSI for the following platforms as well:

Any machines using HSI or HTAR must have all firewalls (local and departmental) configured to allow open access from the following IP addresses:

  • 128.210.251.141
  • 128.210.251.142
  • 128.210.251.143
  • 128.210.251.144
  • 128.210.251.145

If you are unsure of how to modify your firewall settings, please consult with your department's IT support or the documentation for your operating system. Access to Fortress is restricted to on-campus networks. If you need to directly access Fortress from off-campus, please use the Purdue VPN service before connecting.

Interactive usage:

$ hsi

*************************************************************************
*                    Purdue University 
*                  High Performance Storage System (HPSS)
*************************************************************************
* This is the Purdue Data Archive, Fortress.  For further information 
* see http://www.rcac.purdue.edu/userinfo/resources/fortress/
*  
*   If you are having problems with HPSS, please call IT/Operational
*   Services at 49-44000 or send E-mail to dxul-help@purdue.edu.
*
*************************************************************************
Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011] 

[Fortress HSI]/home/myusername->put data1.fits
put  'test' : '/home/myusername/test' ( 1024000000 bytes, 250138.1 KBS (cos=11))

[Fortress HSI]/home/myusername->lcd /tmp

[Fortress HSI]/home/myusername->get data1.fits
get  '/tmp/data1.fits' : '/home/myusername/data1.fits' (2011/10/04 16:28:50 1024000000 bytes, 325844.9 KBS )

[Fortress HSI]/home/myusername->quit

Batch transfer file:

put data1.fits 
put data2.fits 
put data3.fits 
put data4.fits 
put data5.fits 
put data6.fits 
put data7.fits 
put data8.fits 
put data9.fits

Batch usage:

$ hsi < my_batch_transfer_file
*************************************************************************
*                    Purdue University 
*                  High Performance Storage System (HPSS)
*************************************************************************
* This is the Purdue Data Archive, Fortress.  For further information 
* see http://www.rcac.purdue.edu/userinfo/resources/fortress/
*  
*   If you are having problems with HPSS, please call IT/Operational
*   Services at 49-44000 or send E-mail to dxul-help@purdue.edu.
*
*************************************************************************
Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011] 
put  'data1.fits' : '/home/myusername/data1.fits' ( 1024000000 bytes, 250200.7 KBS (cos=11))
put  'data2.fits' : '/home/myusername/data2.fits' ( 1024000000 bytes, 258893.4 KBS (cos=11))
put  'data3.fits' : '/home/myusername/data3.fits' ( 1024000000 bytes, 222819.7 KBS (cos=11))
put  'data4.fits' : '/home/myusername/data4.fits' ( 1024000000 bytes, 224311.9 KBS (cos=11))
put  'data5.fits' : '/home/myusername/data5.fits' ( 1024000000 bytes, 323707.3 KBS (cos=11))
put  'data6.fits' : '/home/myusername/data6.fits' ( 1024000000 bytes, 320322.9 KBS (cos=11))
put  'data7.fits' : '/home/myusername/data7.fits' ( 1024000000 bytes, 253192.6 KBS (cos=11))
put  'data8.fits' : '/home/myusername/data8.fits' ( 1024000000 bytes, 253056.2 KBS (cos=11))
put  'data9.fits' : '/home/myusername/data9.fits' ( 1024000000 bytes, 323218.9 KBS (cos=11))
EOF detected on TTY - ending HSI session

For more information about HSI:

HTAR

HTAR (short for "HPSS TAR") is a utility program that writes TAR-compatible archive files directly onto Fortress, without having to first create a local file. Its command line was originally based on the AIX tar program, with a number of extensions added to provide extra features.

HTAR is already provided on all ITaP research systems as the command htar. You may download HTAR for the following platforms as well:

Any machines using HSI or HTAR must have all firewalls (local and departmental) configured to allow open access from the following IP addresses:

  • 128.210.251.141
  • 128.210.251.142
  • 128.210.251.143
  • 128.210.251.144
  • 128.210.251.145

If you are unsure of how to modify your firewall settings, please consult with your department's IT support or the documentation for your operating system. Access to Fortress is restricted to on-campus networks. If you need to directly access Fortress from off-campus, please use the Purdue VPN service before connecting.

Usage:

  (Create a tar archive on Fortress named data.tar including all files with the extension ".fits".)
$ htar -cvf data.tar *.fits
HTAR: a   data1.fits                                      
HTAR: a   data2.fits
HTAR: a   data3.fits
HTAR: a   data4.fits
HTAR: a   data5.fits
HTAR: a   data6.fits
HTAR: a   data7.fits
HTAR: a   data8.fits
HTAR: a   data9.fits
HTAR: a   /tmp/HTAR_CF_CHK_17953_1317760775
HTAR Create complete for data.tar. 9,216,006,144 bytes written for 9 member files, max threads: 3 Transfer time: 29.622 seconds (311.121 MB/s)
HTAR: HTAR SUCCESSFUL   

  (Unpack a tar archive on Fortress named data.tar into a scratch directory for use in a batch job.)
$ cd $RCAC_SCRATCH/job_dir
$ htar -xvf data.tar 
HTAR: x data1.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data2.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data3.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data4.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data5.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data6.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data8.fits, 1024000000 bytes, 2000001 media blocks
HTAR: x data9.fits, 1024000000 bytes, 2000001 media blocks
HTAR: Extract complete for data.tar, 9 files. total bytes read: 9,216,004,608 in 33.914 seconds (271.749 MB/s )
HTAR: HTAR SUCCESSFUL

  (Look at the contents of the data.tar HTAR archive on Fortress.)
$ htar -tvf data.tar
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:30  data1.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data2.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data3.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data4.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data5.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data6.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data7.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data8.fits
HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data9.fits
HTAR: -rw-------  myusername/pucc        256 2011-10-04 16:39  /tmp/HTAR_CF_CHK_17953_1317760775
HTAR: Listing complete for data.tar, 10 files 10 total objects
HTAR: HTAR SUCCESSFUL

  (Unpack a single file, "data7.fits", from the tar archive on Fortress named data.tar into a scratch directory.)
$ htar -xvf data.tar data7.fits
HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
HTAR: Extract complete for data.tar, 1 files. total bytes read: 1,024,000,512 in 3.642 seconds (281.166 MB/s )
HTAR: HTAR SUCCESSFUL

For more information about HTAR:

SCP

Fortress does NOT support SCP.

SFTP

Fortress does NOT support SFTP.

NFS

If you are using an ITaP research cluster front-end system, your Fortress home directory is available as /archive/fortress/home/myusername. While your Fortress home directory can be accessed via NFS in this way, this is only provided as a convenience and should not be used on a regular basis as it is extremely slow. Instead, use the HSI command to get a fast, parallelized, UNIX-like interface to your Fortress home directory.

Environment Variables

Many environment variables specify storage locations and paths. Your login automatically defines these variables for you. You may redefine them if necessary. In addition, you define many more environment variables when you load the modules of specific applications, such as compilers and MATLAB. (See the module command section for more information.)

Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:

Name Description
USER your username
HOME path to your home directory
PWD path to your current directory
RCAC_SCRATCH path to scratch filesystem
PATH all directories searched for commands/applications
HOSTNAME name of the machine you are on
SHELL your current shell (bash, tcsh, csh, ksh)
SSH_CLIENT your local client's IP address
TERM type of terminal or terminal emulator being used
OMP_NUM_THREADS number of OpenMP threads

By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

$ ls $HOME
...

$ ls $RCAC_SCRATCH/myproject
...

To find the value of any environment variable:

$ echo $RCAC_SCRATCH
/scratch/lustreA/m/myusername

$ echo $SHELL
/bin/tcsh

To list the values of all environment variables:

$ env
USER=myusername
HOME=/home/myusername
RCAC_SCRATCH=/scratch/lustreA/m/myusername
SHELL=/bin/tcsh
...

You may create or overwrite an environment variable. To pass (export) the value of a variable in either bash or ksh:

$ export VARIABLE=value

To assign a value to an environment variable in either tcsh or csh:

$ setenv VARIABLE value

Storage Quotas / Limits

ITaP imposes some limits on your disk usage on research systems. Each filesystem (home directory, scratch directory, etc.) may have a different limit. ITaP does not implement a soft limit or quota. However, if you exceed the hard limit or limit, your write will fail. You can then either remove files you no longer need, move them to the Fortress HPSS Archive, or ask us about increasing your quota.

Checking Quota Usage

To discover the current quotas of your home and scratch directories:

$ myquota
Type        Filesystem          Size    Limit  Use         Files    Limit  Use
==============================================================================
home        extensible         5.0GB   10.0GB  50%             -        -   - 
scratch     /scratch/lustreA/    8KB  476.8GB   0%             2  100,000   0%

The columns are as follows:

  1. Type: indicates home or scratch directory.
  2. Filesystem: name of storage option.
  3. Size: sum of file sizes in bytes.
  4. Limit: allowed maximum on sum of file sizes in bytes.
  5. Use: percentage of file-size limit currently in use.
  6. Files: number of files and directories (not the size).
  7. Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
  8. Use: percentage of file-number limit currently in use.

If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

$ du -h --max-depth=1 $HOME >myfile
32K /home/myusername/mysubdirectory_1
529M    /home/myusername/mysubdirectory_2
608K    /home/myusername/mysubdirectory_3

The second directory is the largest of the three, so apply command du to it.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

$ du -h --max-depth=1 $RCAC_SCRATCH >myfile
160K    /scratch/lustreA/m/myusername

This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to alternate long-term storage to free space in your home and scratch directories.

Increasing Your Storage Quota

If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may go to the BoilerBackpack Quota Management site and use the sliders there to increase the amount of space allocated to your research home directory vs. other storage options, up to a maximum of 100GB.

Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. ITaP provides the following tools:

  • zip   (more information)
    Simple compression and file packaging utility.
    Examples:
      (extract contents of somefile.zip)
    $ unzip somefile.zip
    
      (compress file somefile.c)
    $ zip somefile.zip somefile.c
    
      (compress all files in a directory into one archive file)
    $ zip -r somefile.zip somedirectory/
    
      (compress all ".c" files in current directory into one archive file)
    $ zip -r somefile.zip . -i \*.c
    
  • 7zip   (more information)
    Simple compression and file packaging utility which offers much better compression than zip.
    Examples:
      (extract contents of somefile.7z)
    $ 7za e somefile.7z
    
      (compress file somefile.c)
    $ 7za a somefile.7z somefile.c
    
      (compress all files in a directory into one archive file)
    $ 7za a somefile.7z somedirectory/
    
      (compress all ".c" files in current directory into one archive file)
    $ 7za a somefile.7z *.c
    
  • tar   (more information)
    Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.
    Examples:
      (list contents of archive somefile.tar)
    $ tar tvf somefile.tar
    
      (extract contents of somefile.tar)
    $ tar xvf somefile.tar
    
      (extract contents of gzipped archive somefile.tar.gz)
    $ tar xzvf somefile.tar.gz
    
      (extract contents of bzip2 archive somefile.tar.bz2)
    $ tar xjvf somefile.tar.bz2
    
      (extract contents of xz archive somefile.tar.xz)
    $ tar xJvf somefile.tar.xz
    
      (archive file somefile.c)
    $ tar cvf somefile.tar somefile.c
    
      (archive all ".c" files in current directory into one archive file)
    $ tar cvf somefile.tar.gz *.c 
    
      (archive all files in a directory into one archive file)
    $ tar cvf somefile.tar.gz somedirectory/
    
      (archive and gzip-compress all files in a directory into one archive file)
    $ tar czvf somefile.tar.gz somedirectory/
    
      (archive and bzip2-compress all files in a directory into one archive file)
    $ tar cjvf somefile.tar.bz2 somedirectory/
    
      (archive and xz-compress all files in a directory into one archive file)
    $ tar cJvf somefile.tar.xz somedirectory/
    
  • gzip   (more information)
    Compression utility designed as a replacement for compress, with much better compression and no patented algorithms. The standard compression system for all GNU software.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ gzip somefile
    
      (uncompress file somefile.gz - also removes compressed file)
    $ gunzip somefile.gz
    
  • bzip2   (more information)
    Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ bzip2 somefile
    
      (uncompress file somefile.bz2 - also removes compressed file)
    $ bunzip2 somefile.bz2
    
  • xz   (more information)
    Strong, lossless data compressor based on the LZMA2 compression algorithm. Stronger compression than gzip or bzip2.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ xz somefile
    
      (uncompress file somefile.xz - also removes compressed file)
    $ unxz somefile.xz
    
  • compress   (more information)
    Adaptive Lempel-Ziv compressor. Not often used today.

Windows users can work with these same formats using some of the following software:

  • 7-Zip
    Free Windows software package that can handle all the above formats.
  • WinZip
    Commercial Windows software package that can handle all the above formats.
  • WinRAR
    Commercial Windows software package that can handle all the above formats.

File Transfer

There are a variety of ways to transfer data to and from ITaP research systems. Which you should use depends on several factors, including the ease of use for you personally, connection speed and bandwidth, and the size and number of files which you intend to transfer.

FTP

FTP (File Transfer Protocol) is a simple data transfer mechanism. FTP does not provide secure communications, so ITaP no longer supports FTP on any ITaP research systems. However, most modern FTP clients support either SFTP or SCP, which are similar, secure protocols for file transfer. Try using one of the other methods described here instead of FTP.

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH (Secure SHell) protocol. You may use SCP to connect to any system where you have SSH (login) access. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage:

  (to a remote system from local)
$ scp sourcefilename myusername@hostname:somedirectory/destinationfilename

  (from a remote system to local)
$ scp myusername@hostname:somedirectory/sourcefilename destinationfilename

  (recursive directory copy to a remote system from local)
$ scp sourcedirectory/ myusername@hostname:somedirectory/

Linux / Solaris / AIX / HP-UX / Unix:

  • You should have already installed the "scp" command-line program.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SCP and SFTP client.
  • PuTTY also offers "pscp.exe", which is an extremely small program and a basic SCP client.
  • Secure FX is a commercial SCP and SFTP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".

SFTP

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. You may use SFTP to connect to most ITaP research systems. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

Command-line usage:

$ sftp -B buffersize myusername@hostname

      (to a remote system from local)
sftp> put sourcefile somedir/destinationfile
sftp> put -P sourcefile somedir/

      (from a remote system to local)
sftp> get sourcefile somedir/destinationfile
sftp> get -P sourcefile somedir/

sftp> exit
  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Linux / Solaris / AIX / HP-UX / Unix:

  • The "sftp" command line program should already be installed.

Microsoft Windows:

  • WinSCP is a full-featured and free graphical SFTP and SCP client.
  • PuTTY also offers "psftp.exe", which is an extremely small program and a basic SFTP client.
  • Secure FX is a commercial SFTP and SCP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

Mac OS X:

  • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • MacSFTP is a free graphical SFTP client for Macs.

LFTP

LFTP is a command-line file-transfer program for Linux and Unix systems. It supports SFTP, HTTP, and HTTPS file-transfers. LFTP has additional features not provided by SFTP such as bandwidth throttling, transfer queues, and parallel transfers. Use interactively or scripted.

LFTP with parallel transfers can be much faster than SCP or SFTP, so ITaP encourages its use, when possible.

LFTP is available only on some ITaP research systems. However, it is simply a client, so the remote machine involved in a transfer does not need it (the remote system need only support SFTP).

Interactive usage:

$ lftp myusername@hostname

         (transfer all ".dat" files from remote system to local)
lftp :~> mget *.dat

         (transfer "filename.dat" file from local system to remote)
lftp :~> put filename.dat

         (transfer a directory and all contents from remote
          system to local, using 5 connections in parallel)
lftp :~> mirror --parallel=5 remotedirectory localdirectory/

         (transfer a directory and all contents from local
          system to remote, using 8 connections in parallel)
lftp :~> mirror -R --parallel=8 localdirectory remotedirectory/

Batch usage:

  (specify all actions on command line)
$ lftp myusername@hostname -e "mget *.dat"

  (specify all actions in the script file "mytransfer.lftp")
$ lftp myusername@hostname -f mytransfer.lftp

GridFTP

GridFTP is a fast method of transferring large files that uses Globus authentication credentials (x509 certificates). GridFTP is available on some ITaP resources, but only to users who are members of a Grid project, such as TeraGrid, NorthWest Indiana Computational Grid (NWICG), or Open Science Grid (OSG). However, not all grids may access all ITaP resources.

For more information about how to use GridFTP, consult documentation for your participating grid.

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Windows:

  • Click Windows menu > Computer (Vista/7) or Start > My Computer (XP)
  • Click Map Network Drive in the top bar (Vista/7) or Tools > Map Network Drive (XP)
  • In the folder location enter the following information and click Finish:

    • To access your home directory, enter \\samba.rcac.purdue.edu\myusername where myusername is your career account name.
    • To access your scratch storage, enter the following:

      • For Steele or Radon scratch, enter \\samba.rcac.purdue.edu\scratch9N\m\myusername where m is the first letter of your username, myusername is your career account name, and N is the number of your scratch drive (N can be 5, 6, 8, or 9.) You need to know beforehand which scratch drive your directory is on, for example scratch95.
      • For Coates or Rossmann scratch, enter \\samba.rcac.purdue.edu\lustreA\m\myusername where m is the first letter of your username and myusername is your career account name.
      • For Carter, Hansen, or WinHPC scratch, enter \\samba.rcac.purdue.edu\lustreC\m\myusername where m is the first letter of your username and myusername is your career account name.

    • To access Fortress long-term storage, enter \\fortress-smb.rcac.purdue.edu\myusername where myusername is your career account name.

  • You may be prompted for login information. Enter your username as onepurdue\myusername and your account password. If you forget the onepurdue prefix it will prevent you from logging in.
  • Your home, scratch, or fortress directory should now be mounted as a drive in the Computer window.

Mac OS X:

  • In the Finder, click Go > Connect to Server (or the Command-K shortcut)
  • In the Server Address enter the following information and click Connect:

    • To access your home directory, enter smb://samba.rcac.purdue.edu/myusername where myusername is your career account name.
    • To access your scratch storage, enter the following:

      • For Steele or Radon scratch, enter smb://samba.rcac.purdue.edu/scratch9N/m/myusername where m is the first letter of your username, myusername is your career account name, and N is the number of your scratch drive (N can be 5, 6, 8, or 9.) You need to know beforehand which scratch drive your directory is on, for example scratch95.
      • For Coates or Rossmann scratch, enter smb://samba.rcac.purdue.edu/lustreA/m/myusername where m is the first letter of your username and myusername is your career account name.
      • For Carter, Hansen, or WinHPC scratch, enter smb://samba.rcac.purdue.edu/lustreC/m/myusername where m is the first letter of your username and myusername is your career account name.

    • To access Fortress long-term storage, enter smb://fortress-smb.rcac.purdue.edu/myusername where myusername is your career account name.

  • You may be prompted for login information. Enter your username, password and for the Domain make sure to enter onepurdue or it will prevent you from logging in.

Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you'd like access via samba on the command line you may install smbclient which will give you ftp-like access and can be used as shown below. SCP or SFTP is recommended over this use case. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //samba.rcac.purdue.edu/myusername -U myusername -W onepurdue

Applications on Rossmann

Provided Applications

The following table lists the third-party software which ITaP has installed on its research systems. Additional software may be available. To see the software on a specific system, run the command module avail on that system. Please contact rcac-help@purdue.edu if you are interested in the availability of software not shown in this list.

Software Radon Steele Coates, Rossmann, Hansen & Carter Peregrine 1
Abaqus ¹
AcGrace
Amber ¹
Ann
ANSYS ¹
ATK
Antelope
Auto3Dem
ATLAS
BinUtils
BLAST
Boost
Cairo
CDAT
CGNSLib
Cmake
COMSOL ²
CPLEX ¹
DX
Eman
Eman2
Ferret
FFMPEG
FFTW
FLUENT ¹
GAMESS
GAMS
Gaussian ¹
GCC (Compilers)
GDAL
GemPak
Git
GLib
GMP
GMT
GrADS
GROMACS
GS
GSL
GTK+
GTKGlarea
Guile
HarminV
HDF4
HDF5
Hy3S
ImageMagick
IMSL ¹
Intel Compilers ¹
Jackal ²
Jasper
Java
LAMMPS
LibCTL
LibPNG
LibTool
LoopyMod ²
Maple ¹
Mathematica ¹
MATLAB ¹
Meep
MoPac
MPB
MPFR
MPICH
MPICH2
MPIExec
MrBayes
MUMPS
MVAPICH2
NAMD
NCL
NCO
NCView
NetCDF
NETPBM
NWChem
Octave
OpenMPI
Pango
Petsc
PGI Compilers ¹
Phrap
Pixman
PKG-Config
Proj
Python
QTLC
Rational
R
SAC
SAS ¹
ScaLAPACK
Seismic
Subversion
SWFTools
Swig
SysTools
Tao
TecPlot ²
TotalView ¹
UDUNITS
Valgrind
VMD
Weka

¹ Only users on Purdue's West Lafayette campus may use this software.
² Only specific research groups may use this software.

Please contact rcac-help@purdue.edu for specific questions about software license restrictions on ITaP research systems.

Environment Management with the Module Command

ITaP uses the module command as the preferred method to manage your processing environment. With this command, you may load applications and compilers along with their libraries and paths. Modules are packages which you load and unload as needed.

Please use the module command and do not manually configure your environment, as ITaP staff will frequently make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.

To view a brief usage report:

$ module

Below follows a short introduction to the module command. You can see more in the man page for module.

List Available Modules

To see what modules are available on this system:

$ module avail

To see which versions of a specific compiler are available on this system:

$ module avail gcc
$ module avail intel
$ module avail pgi

To see available modules with MPI:

$ module avail mvapich
$ module avail openmpi

To see available modules for specific provided applications, use names from the list obtained with the command module avail:

$ module avail abaqus
$ module avail matlab
$ module avail mathematica

Load / Unload a Module

All modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or you may specify which version you wish to load.

For each cluster, ITaP makes a recommendation regarding the set of compiler, math library, and message-passing library for parallel code. To load the recommended set:

$ module load devel

To verify what you loaded:

$ module list

To load the default version of a specific compiler, choose one of the following commands:

$ module load gcc
$ module load intel
$ module load pgi

To load a specific version of the Intel compiler, include the version number:

$ module load intel/11.1.072

When running a job, you must use the job submission file to load on the compute node(s) any relevant modules. Loading modules on the front end before submitting your job is sufficient when using the front end during the development phase of your application but not sufficient when using the compute node(s) during the production phase. You must load the same modules on the compute node(s).

To unload a module, enter the same module name used to load that module. Unloading will attempt to undo the environmental changes which a previous load command installed.

To unload the default version of a specific compiler:

$ module unload gcc
$ module unload intel
$ module unload pgi

To unload a specific version of the Intel compiler, include the same version number used to load that Intel compiler:

$ module unload intel/11.1.072

Apply the same methods to manage the modules of provided applications:

$ module load matlab
$ module unload matlab

To unload all currently loaded modules:

module purge

List Currently Loaded Modules

To see currently loaded modules:

$ module list
Currently Loaded Modulefiles:
  1) intel/12.1

To unload a module:

$ module unload intel
$ module list
No Modulefiles Currently Loaded.

Show Module Details

To learn more about what a module does to your environment, you may use the module show module_name command, where module_name is any name in the list from command module avail. This can be either a default name like "intel", "gcc", "pgi", and "matlab", or a specific version of a module, such as "intel/11.1.072". Here is an example showing what loading the default Intel compiler does to the processing environment:

$ module show intel
-------------------------------------------------------------------
/opt/modules/modulefiles/intel/12.1:

module-whatis    invoke Intel 12.1.0 Compilers (64-bit) 
prepend-path     PATH /opt/intel/composer_xe_2011_sp1.6.233/bin/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
prepend-path     LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21 
prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/%l_%t/%N 
prepend-path     CPATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/include 
setenv           CC icc 
setenv           CXX icpc 
setenv           FC ifort 
setenv           ICC_HOME /opt/intel/composer_xe_2011_sp1.6.233 
setenv           IFORT_HOME /opt/intel/composer_xe_2011_sp1.6.233 
setenv           MKL_HOME /opt/intel/composer_xe_2011_sp1.8.273/mkl 
setenv           TBBROOT /opt/intel/composer_xe_2011_sp1.6.233/tbb 
setenv           LAPACK_INCLUDE -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include 
setenv           LAPACK_INCLUDE_F95 -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include/intel64/lp64 
setenv           LINK_LAPACK -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK95 -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
setenv           LINK_LAPACK95_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 
-------------------------------------------------------------------

To show what loading a specific Intel compiler version does to the processing environment:

$ module show intel/11.1.072
-------------------------------------------------------------------
/opt/modules/modulefiles/intel/11.1.072:

module-whatis    invoke Intel 11.1.072 64-bit Compilers 
prepend-path     PATH /opt/intel/Compiler/11.1/072/bin/intel64 
prepend-path     LD_LIBRARY_PATH /opt/intel/mkl/10.2.5.035/lib/em64t 
prepend-path     LD_LIBRARY_PATH /opt/intel/Compiler/11.1/072/lib/intel64 
prepend-path     NLSPATH /opt/intel/mkl/10.2.5.035/lib/em64t/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/idb/intel64/locale/%l_%t/%N 
prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/lib/intel64/locale/%l_%t/%N 
setenv           CC icc 
setenv           CXX icpc 
setenv           FC ifort 
setenv           F90 ifort 
setenv           ICC_HOME /opt/intel/Compiler/11.1/072 
setenv           IFORT_HOME /opt/intel/Compiler/11.1/072 
setenv           MKL_HOME /opt/intel/mkl/10.2.5.035 
setenv           LAPACK_INCLUDE -I/opt/intel/mkl/10.2.5.035/include 
setenv           LAPACK_INCLUDE_F95 -I/opt/intel/mkl/10.2.5.035/include/em64t/lp64 
setenv           LINK_LAPACK -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t 
setenv           LINK_LAPACK_STATIC -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread 
setenv           LINK_LAPACK95 -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t 
setenv           LINK_LAPACK95_STATIC -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread 
-------------------------------------------------------------------

Compiling Source Code on Rossmann

Provided Compilers

Compilers are available on Rossmann for Fortran 77, Fortran 90, Fortran 95, C, and C++. The compilers can produce general-purpose and architecture-specific optimizations to improve performance. These include loop-level optimizations, inter-procedural analysis and cache optimizations. The compilers support automatic and user-directed parallelization of Fortran, C, and C++ applications for multiprocessing execution. More detailed documentation on each compiler set available on Rossmann follows.

On Rossmann, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

  • PGI 11.8
  • ACML
  • OpenMPI 1.4.4

To load the recommended set:

$ module load devel
$ module list

Intel Compiler Set

One or more versions of the Intel compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail intel/
$ module avail openmpi
$ module avail mvapich
$ module avail mpich

Choose an appropriate Intel module and load it. For example:

module load intel

Here are some examples for the Intel compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ ifort myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ ifort -openmp myprogram.f -o myprogram
Fortran90
$ ifort myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ ifort -openmp myprogram.f90 -o myprogram
Fortran95 (same as Fortran 90) (same as Fortran 90) (same as Fortran 90)
C
$ icc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ icc -openmp myprogram.c -o myprogram
C++
$ icc myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ icc -openmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the Intel compilers:

GNU Compiler Set

The official name of the GNU compilers is "GNU Compiler Collection" or "GCC". One or more versions of the GNU compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail gcc
$ module avail openmpi
$ module avail mvapich
$ module avail mpich

Choose an appropriate GCC module and load it. For example:

module load gcc

An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load a newer version using the command module load gcc.

Here are some examples for the GNU compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ gfortran myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
Fortran90
$ gfortran myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
Fortran95
$ gfortran myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
C
$ gcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
C++
$ g++ myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram

More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the GCC compilers:

PGI Compiler Set

One or more versions of the PGI compiler set (compilers and associated libraries) are available on Rossmann. To discover which ones:

$ module avail pgi
$ module avail openmpi
$ module avail mvapich
$ module avail mpich

Choose an appropriate PGI module and load it. For example:

module load pgi

Here are some examples for the PGI compilers:

Language Serial Program MPI Program OpenMP Program
Fortran77
$ pgf77 myprogram.f -o myprogram
$ mpif77 myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran90
$ pgf90 myprogram.f90 -o myprogram
$ mpif90 myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran95
$ pgf95 myprogram.f95 -o myprogram
$ mpif90 myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ pgcc myprogram.c -o myprogram
$ mpicc myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ pgCC myprogram.cpp -o myprogram
$ mpiCC myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

More information on compiler options can be found in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

For more documentation on the PGI compilers:

Compiling Serial Programs

A serial program is a single process which executes as a sequential stream of instructions on one computer. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

Here are a few sample serial programs:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your serial program:

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort myprogram.f -o myprogram
$ gfortran myprogram.f -o myprogram
$ pgf77 myprogram.f -o myprogram
Fortran 90
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f90 -o myprogram
$ pgf90 myprogram.f90 -o myprogram
Fortran 95
$ ifort myprogram.f90 -o myprogram
$ gfortran myprogram.f95 -o myprogram
$ pgf95 myprogram.f95 -o myprogram
C
$ icc myprogram.c -o myprogram
$ gcc myprogram.c -o myprogram
$ pgcc myprogram.c -o myprogram
C++
$ icc myprogram.cpp -o myprogram
$ g++ myprogram.cpp -o myprogram
$ pgCC myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Compiling MPI Programs

A message-passing program is a set of processes (often multiple copies of a single process) that take advantage of distributed-memory systems by communicating with each other via the sending and receiving of messages. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. Open MPI, MPICH2 and MVAPICH2 are three implementations of the MPI-2 standard. Libraries for Open MPI, MPICH2 and MVAPICH2 and compilers for C, C++, and versions of Fortran are available.

MPI programs require including a header file:

Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
INCLUDE 'mpif.h'
Fortran 95
INCLUDE 'mpif.h'
C
#include <mpi.h>
C++
#include <mpi.h>

Here are a few sample programs using MPI:

To see the available MPI libraries:

$ module avail openmpi
$ module avail mvapich
$ module avail mpich

The following table illustrates how to compile your message-passing program. Any compiler flags accepted by ifort/icc compilers are compatible with mpif77/mpicc.

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ mpif77 program.f -o program
$ mpif77 program.f -o program
$ mpif77 program.f -o program
Fortran 90
$ mpif90 program.f90 -o program
$ mpif90 program.f90 -o program
$ mpif90 program.f90 -o program
Fortran 95
$ mpif90 program.f95 -o program
$ mpif90 program.f95 -o program
$ mpif90 program.f95 -o program
C
$ mpicc program.c -o program
$ mpicc program.c -o program
$ mpicc program.c -o program
C++
$ mpiCC program.C -o program
$ mpiCC program.C -o program
$ mpiCC program.C -o program

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on the MPI libraries:

Compiling OpenMP Programs

A shared-memory program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables. It distributes the work of a process over several cores of a multi-core processor. Compilers which include OpenMP are available for C, C++, and versions of Fortran.

OpenMP programs require including a header file:

Language Header Files
Fortran 77
Fortran 90
use omp_lib
Fortran 95
use omp_lib
C
#include <omp.h>
C++
#include <omp.h>

Sample programs illustrate task parallelism of OpenMP:

A sample program illustrates loop-level (data) parallelism of OpenMP:

To load a compiler, enter one of the following:

$ module load intel
$ module load gcc
$ module load pgi

The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ ifort -openmp myprogram.f -o myprogram
$ gfortran -fopenmp myprogram.f -o myprogram
$ pgf77 -mp myprogram.f -o myprogram
Fortran 90
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f90 -o myprogram
$ pgf90 -mp myprogram.f90 -o myprogram
Fortran 95
$ ifort -openmp myprogram.f90 -o myprogram
$ gfortran -fopenmp myprogram.f95 -o myprogram
$ pgf95 -mp myprogram.f95 -o myprogram
C
$ icc -openmp myprogram.c -o myprogram
$ gcc -fopenmp myprogram.c -o myprogram
$ pgcc -mp myprogram.c -o myprogram
C++
$ icc -openmp myprogram.cpp -o myprogram
$ g++ -fopenmp myprogram.cpp -o myprogram
$ pgCC -mp myprogram.cpp -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Here is some more documentation from other sources on OpenMP:

Compiling Hybrid Programs

A hybrid program combines both message-passing and shared-memory attributes to take advantage of compute clusters with multi-core compute nodes. Libraries for Open MPI, MPICH2, and MVAPICH2 and compilers which include OpenMP for C, C++, and versions of Fortran are available.

Hybrid programs require including header files:

Language Header Files
Fortran 77
INCLUDE 'mpif.h'
Fortran 90
use omp_lib
INCLUDE 'mpif.h'
Fortran 95
use omp_lib
INCLUDE 'mpif.h'
C
#include <mpi.h>
#include <omp.h>
C++
#include <mpi.h>
#include <omp.h>

A few examples illustrate hybrid programs with task parallelism of OpenMP:

This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

To see the available MPI libraries:

$ module avail openmpi
$ module avail mvapich
$ module avail mpich

The following table illustrates how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by ifort/icc compilers are compatible with mpif77/mpicc and OpenMP.

Language Intel Compiler GNU Compiler PGI Compiler
Fortran 77
$ mpif77 -openmp myprogram.f -o myprogram
$ mpif77 -fopenmp myprogram.f -o myprogram
$ mpif77 -mp myprogram.f -o myprogram
Fortran 90
$ mpif90 -openmp myprogram.f90 -o myprogram
$ mpif90 -fopenmp myprogram.f90 -o myprogram
$ mpif90 -mp myprogram.f90 -o myprogram
Fortran 95
$ mpif90 -openmp myprogram.f90 -o myprogram
$ mpif90 -fopenmp myprogram.f95 -o myprogram
$ mpif90 -mp myprogram.f95 -o myprogram
C
$ mpicc -openmp myprogram.c -o myprogram
$ mpicc -fopenmp myprogram.c -o myprogram
$ mpicc -mp myprogram.c -o myprogram
C++
$ mpiCC -openmp myprogram.C -o myprogram
$ mpiCC -fopenmp myprogram.C -o myprogram
$ mpiCC -mp myprogram.C -o myprogram

The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

Provided Libraries

Some mathematical libraries are available on Rossmann. More detailed documentation about the libraries available on Rossmann follows.

Intel Math Kernel Library (MKL)

Intel Math Kernel Library (MKL) contains ScaLAPACK, LAPACK, Sparse Solver, BLAS, Sparse BLAS, CBLAS, GMP, FFTs, DFTs, VSL, VML, and Interval Arithmetic routines. MKL resides in the directory /opt/intel/mkl/9.1, and it has the following subdirectory structure:

  • lib/32    Libraries for 32-bit Applications
    • libmkl_ia32.a    Optimized Kernels (BLAS, CBLAS, Sparse BLAS, GMP, FFTs, DFTs, VML, VSL, Interval Arithmetic)
    • libmkl_lapack.a    LAPACK Routines
    • libmkl_lapack95.a    LAPACK95 Interface (libmkl_lapack.a also required)
    • libmkl_solver.a    Sparse Solver Routines
    • libguide.a    Threading Library for Static Linking
  • lib/em64t    Libraries for Intel EM64T Applications
    • libmkl_em64t.a    Optimized Kernels (BLAS, CBLAS, Sparse BLAS, GMP, FFTs, DFTs, VML, VSL, Interval Arithmetic)
    • libmkl_lapack.a    LAPACK Routines
    • libmkl_lapack95.a    LAPACK95 Interface (libmkl_lapack.a also required)
    • libmkl_solver.a    Sparse Solver Routines
    • libguide.a    Threading Library for Static Linking

Here are some example combinations of linking options:

  (static linking of LAPACK and Kernels)
$ myfortrancompiler myprogram.f -L${MKLPATH} -lmkl_lapack -lmkl_ia32 -lguide -lpthread

  (static linking of Fortran-95 LAPACK Interface and Kernels)
$ myfortrancompiler myprogram.f95 -L${MKLPATH} -lmkl_lapack95 -lmkl_lapack -lmkl_ia32 -lguide -lpthread

  (static linking of BLAS, Sparse BLAS, GMP, VML/VSL, Interval Arithmetic, and FFT/DFT)
$ myccompiler myprogram.c -L${MKLPATH} -lmkl_ia32 -lguide -lpthread -lm

  (dynamic linking of BLAS or FFTs)
$ myccompiler myprogram.c -L${MKLPATH} -lmkl -lguide -lpthread

ITaP recommends that you use dynamic linking of libguide. If so, define LD_LIBRARY_PATH such that you are using the correct version of libguide at run time. If you use static linking of libguide (discouraged), then:

  • If you use the Intel compilers, link in the libguide version that comes with the compiler (use the -openmp option).
  • If you do not use the Intel compilers, link in the libguide version that comes with the Intel MKL above.

Here are some more documentation from other sources on the Intel MKL:

Mixing Fortran, C, and C++ Code on Unix

You may write different parts of a computing application in different programming languages. For example, an application might incorporate older, legacy code which performs numerical calculations written in Fortran. Systems functions might use C. A newer, main program which binds together all older code might use C++ to take advantage of the object orientation. This section illustrates a few simple examples.

For more information about mixing programming languages:

Using cpp with Fortran

If the source file ends with .F, .fpp, or .FPP, cpp automatically preprocesses the source code before compilation. If you want to use the C preprocessor with source files that do not end with .F, use the following compiler option to specify the filename suffix:

  • GNU Compilers: -x f77-cpp-input
    Note that preprocessing does not extend to the contents of files included by an "INCLUDE" directive. You must use the #include preprocessor directive instead.
    For example, to preprocess source files that end with .f:
    $ gfortran -x f77-cpp-input myprogram.f
    
  • Intel Compilers: -cpp
    To tell the compiler to link using C++ runtime libraries included with gcc/icc:
    $ ... -cxxlib -gcc/-cxxlib -icc
    
    For example, to preprocess source files that end with .f:
    $ ifort -cpp myprogram.f
    

Generally, it is advisable to rename your file from myprogram.f to myprogram.F. The preprocessor then automatically runs when you compile the file.

For more information on combining C/C++ and Fortran:

C Program Calling Subroutines in Fortran, C, and C++

A C language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C program calls the Fortran routine with the underscore character.

Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C++ routine to a C program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ while the C program again specifies a pointer (ampersand "&") in the call to the C++ routine.

The C++ compiler must know at the time of compiling the C++ routine that the C program will invoke the C++ routine with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
C Main Program
$ module load intel
$ icc -c main.c
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ icc -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ gcc -c main.c
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ gcc -lstdc++ main.o f90.o c.o cpp.o
$ module load pgi
$ pgcc -c main.c
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgf90 -Mnomain main.o c.o cpp.o f90.f90

The results show that each routine successfully returns a different character to the main program:

$ a.out
main(), initial value:               chr=X
main(), after function subr_f_():    chr=f
main(), after function func_c():     chr=c
main(), after function func_cpp():   chr=+
Exit main.c

C++ Program Calling Subroutines in Fortran, C, and C++

A C++ language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C++ program calls the Fortran routine with the underscore character.

Fortran uses pass-by-reference while C++ uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C++ program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C routine to a C++ program, the C routine must declare a parameter as a pointer (asterisk "*") while the C++ program again specifies a pointer (ampersand "&") in the call to the C routine.

The C++ compiler must know at the time of compiling the C++ program that the C++ program will invoke the Fortran and C routines with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
C++ Main Program
$ module load intel
$ icc -c main.cpp
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ icc -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ g++ -c main.cpp
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ g++ main.o f90.o c.o cpp.o
$ module load pgi
$ pgCC -c main.cpp
$ pgf90 -c f90.f90
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgCC -L../lib main.o c.o cpp.o f90.o -pgf90libs

The results show that each routine successfully returns a different character to the main program:

$ a.out
main(), initial value:               chr=X
main(), after function subr_f_():    chr=f
main(), after function func_c():     chr=c
main(), after function func_cpp():   chr=+
Exit main.cpp

Fortran Program Calling Subroutines in Fortran, C, and C++

A Fortran language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine, so the definitions of the C and C++ routines must include the underscore. The Fortran program calls these routines without the underscore character in the Fortran source code.

Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a C routine to a Fortran program requires the parameter of the C routine to be a pointer (asterisk "*") in the C routine's definition. To pass a value from a C++ routine to a Fortran program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ in its definition.

The C++ compiler must know at the time of compiling the C++ routine that the Fortran program will invoke the C++ routine with the C-style interface rather than the C++ interface.

The following files of source code illustrate these technical details:

Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

Compiler Intel GNU PGI
Fortran 90 Main Program
$ module load intel
$ ifort -c main.f90
$ ifort -c f90.f90
$ icc -c c.c
$ icc -c cpp.cpp
$ ifort -lstdc++ main.o f90.o c.o cpp.o
$ module load gcc
$ gfortran -c main.f90
$ gfortran -c f90.f90
$ gcc -c c.c
$ g++ -c cpp.cpp
$ gfortran -lstdc++ main.o c.o cpp.o f90.o
$ module load pgi
$ pgf90 -c main.f90
$ pgf90 -c f90.f90
$ pgcc -c c.c
$ pgCC -c cpp.cpp
$ pgf90 main.o c.o cpp.o f90.o

The results show that each routine successfully returns a different character to the main program:

$ a.out
 main(), initial value:               chr=X
 main(), after function subr_f():     chr=f
 main(), after function subr_c():     chr=c
 main(), after function func_cpp():   chr=+
 Exit mixlang

Running Jobs on Rossmann

There are two methods for submitting jobs to the Rossmann community cluster. First, you may use the portable batch system (PBS) to submit jobs directly to a queue on Rossmann. PBS performs job scheduling. Jobs may be serial, message-passing, shared-memory, or hybrid (message-passing + shared-memory) programs. You may use either the batch or interactive mode to run your jobs. Use the batch mode for finished programs; use the interactive mode only for debugging. Secondly, since the Rossmann cluster is a part of BoilerGrid, you may submit serial jobs to BoilerGrid and specifically request compute nodes on Rossmann.

Running Jobs via PBS

The Portable Batch System (PBS) is a richly featured workload management system providing job scheduling and job management interface on computing resources, including Linux clusters. With PBS, a user requests resources and submits a job to a queue. The system will then take jobs from queues, allocate the necessary nodes, and execute them in as efficient a manner as it can.

Do NOT run large, long, multi-threaded, parallel, or CPU-intensive jobs on a front-end login host. All users share the front-end hosts, and running anything but the smallest test job will negatively impact everyone's ability to use Rossmann. Always use PBS to submit your work as a job. You may even submit interactive sessions as jobs. This section of documentation will explain how to use PBS.

Tips

  • Remember that ppn can not be larger than the number of processor cores on each node.
  • If you compiled your own code, you must module load that same compiler from your job submission file. However, it is not necessary to load the standard compiler module if you load the corresponding compiler module with parallel libraries included.
  • To see a list of the nodes which ran your job: cat $PBS_NODEFILE
  • The order of processor cores is random. There is no way to tell which processor will do what or in which order in a parallel program.
  • If you use the tcsh and csh shells and if a .logout file exists in your home directory, the exit status of your jobs will be that of the .logout script, not the job submission file. This may impact any interjob dependencies. To preserve the job exit status, remove the .logout file.

Queues

Rossmann, as a community cluster, has one or more queues dedicated to each partner who has purchased access to the cluster. These queues provide partners with priority access to their portion of the cluster. Additionally, community clusters provide a "standby" queue which is available to all cluster users. This "standby" queue allows users to utilize portions of the cluster that would otherwise be idle, but at a lower priority than partner-queue jobs, and with a relatively short time limit, to ensure "standby" jobs will not be able to tie up resources and prevent partner-queue jobs from running quickly.

To see a list of all queues on Rossmann, use the qstat -q command:

$ qstat -q

server: rossmann-adm.rcac.purdue.edu

Queue            Memory CPU Time Walltime Node   Run   Que   Lm  State
---------------- ------ -------- -------- ---- ----- ----- ----  -----
queue1             --      --    720:00:0  --      2     0   --   E R
queue2             --      --    720:00:0  --      5     2   --   E R
queue3             --      --    720:00:0  --      7     0   --   E R
 ...
standby            --      --    04:00:00  --      1    11   --   E R
                                               ----- -----
                                                  15    13

Job Submission File

To submit work to a PBS queue, you must first create a job submission file. This job submission file is essentially a simple shell script. It will set any required environment variables, load any necessary modules, create or modify files and directories in your scratch space, and invoke any applications that you need. However, a job submission file can be as simple as the path to your application:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Print the hostname of the compute node on which this job is running.
/bin/hostname

Or, as simple as listing the names of compute nodes assigned to your job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

PBS sets several potentially useful environment variables which you may use within your job submission files. Here is a list of some:

Name Description
PBS_O_WORKDIR Absolute path of the current working directory when you submitted this job
PBS_JOBID Job ID number assigned to this job by the batch system
PBS_JOBNAME Job name supplied by the user
PBS_NODEFILE File containing the list of nodes assigned to this job
PBS_O_HOST Hostname of the system where you submitted this job
PBS_O_QUEUE Name of the original queue to which you submitted this job
PBS_O_SYSTEM Operating system name given by uname -s where you submitted this job
PBS_ENVIRONMENT "PBS_BATCH" if this job is a batch job, or "PBS_INTERACTIVE" if this job is an interactive job

Here is an example of a commonly used PBS variable, making sure a job runs from within the same directory that you submitted it from:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Change to the directory from which you originally submitted this job.
cd $PBS_O_WORKDIR

# Print out the current working directory path.
pwd

You may also find the need to load a module to run a job on a compute node. Loading a module on a front end does NOT automatically load that module on the compute node where a job runs. You must use the job submission file to load a module on the compute node:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

# Load the module for NetPBM.
module load netpbm

# Convert a PostScript file to GIF format using NetPBM tools.
pstopnm myfilename.ps | ppmtogif > myfilename.gif

Job Submission

Once you have a job submission file, you may submit this script to PBS using the qsub command. PBS will find an available processor core or a set of processor cores and run your job there, or leave your job in a queue until some become available. At submission time, you may also optionally specify many other attributes or job requirements you have regarding where your jobs will run.

To submit your serial job to one processor core on one compute node with no special requirements:

$ qsub myjobsubmissionfile

The previous example uses two default cases involving compute nodes and processor cores:

$ qsub -l nodes=1:ppn=1 myjobsubmissionfile

To submit your job to a specific queue:

$ qsub -q myqueuename myjobsubmissionfile

By default, each job receives 30 minutes of wall time for its execution. The wall time is the total time in real clock time (not CPU cycles) that you believe your job will need to run to completion. If you know that your job will not need more than a certain amount of time to run, it is very much to your advantage to request less than the maximum allowable wall time, as this may allow your job to schedule and run sooner. To request the specific wall time of 1 hour and 30 minutes:

$ qsub -l nodes=1:ppn=1,walltime=01:30:00 myjobsubmissionfile

To request more than one processor core on one or more compute nodes:

$ qsub -l nodes=2:ppn=4 myjobsubmissionfile

The nodes resource indicates how many virtual nodes you would like reserved for your job. By default, PBS maps the nodes resource to a virtual node (that is, directly to a processor, not a full physical compute node). The node property ppn specifies how many processor cores you need on each virtual node. The previous example requests 2 virtual nodes with 4 processor cores each. PBS may or may not assign virtual nodes on different physical compute nodes. Each compute node in Rossmann has 24 processor cores. So, the two virtual nodes of this example can reside on a single compute node. Explanations regarding the distribution of your job across different compute nodes for parallel programs appear in the sections covering specific parallel programming libraries.

Here is a typical list of compute node names from a qsub command requesting 2 virtual nodes and 4 processor cores:

rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638

Normally, compute nodes running your job may also be running jobs from other users. ITaP research systems have many processor cores in each compute node, so node sharing allows more efficient use of the system. However, if you have special needs that prohibit others from effectively sharing a compute node with your job, such as needing all of the memory on a compute node, you may request exclusive access to any compute nodes allocated to your job.

To request exclusive access to a compute node, set ppn to the maximum number of processor cores physically available on a compute node:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile

Note that if you request more than ppn=24 on Rossmann, your job will never run, because compute nodes in Rossmann have only 24 processor cores each.

If more convenient, you may also specify any command line options to qsub from within your job submission file, using a special form of comment:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile

#PBS -q myqueuename
#PBS -l nodes=1:ppn=24
#PBS -l walltime=01:30:00
#PBS -N myjobname

# Print the hostname of the compute node on which this job is running.
/bin/hostname

If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.

After you submit your job with qsub, it can reside in a queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the number of compute nodes requested, the amount of wall time requested, and what other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.

PBS catches only output written to standard output and standard error. Standard output (output normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "o", for example myjobsubmissionfile.o1234, where "1234" represents the PBS job ID. Errors that occurred during the job run and written to standard error (output also normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "e", for example myjobsubmissionfile.e1234. Often, the error file is empty. If your job wrote results to a file, those results will appear in that file.

Parallel applications may require special care in the selection of PBS resources. Please refer to the sections that follow for details on how to run parallel applications with various parallel libraries.

Job Status

The command qstat -a will list all jobs currently queued or running and some information about each:

$ qstat -a

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
107025.rossmann user123  standby  hello         --    1   8    --  00:05 Q   --
115505.rossmann user456  ncn      job4         5601   1   1    --  600:0 R 575:0
...
189479.rossmann user456  standby  AR4b          --    5  40    --  04:00 H   --
189481.rossmann user789  standby  STDIN        1415   1   1    --  00:30 R 00:07
189483.rossmann user789  standby  STDIN        1758   1   1    --  00:30 R 00:07
189484.rossmann user456  standby  AR4b          --    5  40    --  04:00 H   --
189485.rossmann user456  standby  AR4b          --    5  40    --  04:00 Q   --
189486.rossmann user123  tg_workq STDIN         --    1   1    --  12:00 Q   --
189490.rossmann user456  standby  job7        26655   1   8    --  04:00 R 00:06
189491.rossmann user123  standby  job11         --    1   8    --  04:00 Q   --

The status of each job listed appears in the "S" column toward the right. Possible status codes are: "Q" = Queued, "R" = Running, "C" = Completion, and "H" = Held.

To see only your own jobs, use the -u option to qstat and specify your own username:

$ qstat -a -u myusername

rossmann-adm.rcac.purdue.edu:
                                                              Req'd  Req'd   Elap
Job ID          Username   Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- ---------- -------- ---------- ------ --- --- ------ ----- - -----
182792.rossmann myusername standby  job1        28422   1   4    --  23:00 R 20:19
185841.rossmann myusername standby  job2        24445   1   4    --  23:00 R 20:19
185844.rossmann myusername standby  job3        12999   1   4    --  23:00 R 20:18
185847.rossmann myusername standby  job4        13151   1   4    --  23:00 R 20:18

To retrieve useful information about your queued or running job, use the checkjob command with your job's ID number. The output should look similar to the following:

$ checkjob -v 163000

job 163000 (RM job '163000.rossmann-adm.rcac.purdue.edu')

AName: test
State: Idle 
Creds:  user:myusername  group:mygroup  class:myqueue
WallTime:   00:00:00 of 20:00:00
SubmitTime: Wed Apr 18 09:08:37
  (Time Queued  Total: 1:24:36  Eligible: 00:00:23)

NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 2
Total Requested Nodes: 1

Req[0]  TaskCount: 2  Partition: ALL  
TasksPerNode: 2  NodeCount:  1


Notification Events: JobFail

IWD:            /home/myusername/gaussian
UMask:          0000 
OutputFile:     rossmann-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.o163000
ErrorFile:      rossmann-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.e163000
User Specified Partition List:   rossmann-adm,SHARED
Partition List: rossmann-adm
SrcRM:          rossmann-adm  DstRM: rossmann-adm  DstRMJID: 163000.rossmann-adm.rcac.purdue.edu
Submit Args:    -l nodes=1:ppn=2,walltime=20:00:00 -q myqueue
Flags:          RESTARTABLE
Attr:           checkpoint
StartPriority:  1000
PE:             2.00
NOTE:  job violates constraints for partition rossmann-adm (job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160))

BLOCK MSG: job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160) (recorded at last scheduling iteration)

There are several useful bits of information in this output.

  • State lets you know if the job is Idle, Running, Completed, or Held.
  • WallTime will show how long the job has run and its maximum time.
  • SubmitTime is when the job was submitted to the cluster.
  • Total Requested Tasks is the total number of cores used for the job.
  • Total Requested Nodes and NodeCount are the number of nodes used for the job.
  • TasksPerNode is the number of cores used per node.
  • IWD is the job's working directory.
  • OutputFile and ErrorFile are the locations of stdout and stderr of the job, respectively.
  • Submit Args will show the arguments given to the qsub command.
  • NOTE/BLOCK MSG will show details on why the job isn't running. The above error says that all the cores are in use on that queue and the job has to wait. Other errors may give insight as to why the job fails to start or is held.

Job Cancellation

To stop a job before it finishes or remove it from a queue, use the qdel command:

$ qdel myjobid

You find the job ID using the qstat command as explained in the PBS Job Status section.

Examples

To submit jobs successfully, you must understand how to request the right computing resources. This section contains examples of specific types of PBS jobs. These examples illustrate requesting various groupings of nodes and processor cores, using various parallel libraries, and running interactive jobs. You may wish to look here for an example that is most similar to your application and use a modified version of that example's job submission file for your jobs.

Batch

This simple example submits the job submission file hello.sub to the standby queue on Rossmann and requests 4 nodes:

$ qsub -q standby -l nodes=4,walltime=00:01:00 hello.sub
99.rossmann-adm.rcac.purdue.edu

Remember that ppn can not be larger than the number of processor cores on each node.

After your job finishes running, the ls command will show two new files in your directory, the .o and .e files:

$ ls -l
hello
hello.c
hello.out
hello.sub
hello.sub.e99
hello.sub.o99

If everything went well, then the file hello.sub.e99 will be empty, since it contains any error messages your program gave while running. The file hello.sub.o99 contains the output from your program.

Using Environment Variables in a Job

If you would like to see the value of the environment variables from within a PBS job, you can prepare a job submission file with an appropriate filename, here named env.sub:

#!/bin/sh -l
# FILENAME:  env.sub

# Request four nodes, 1 processor core on each.
#PBS -l nodes=4:ppn=1,walltime=00:01:00
	
# Change to the directory from which you submitted your job.
cd $PBS_O_WORKDIR
	
# Show details, especially nodes.
# The results of most of the following commands appear in the error file.
echo $PBS_O_HOST
echo $PBS_O_QUEUE
echo $PBS_O_SYSTEM
echo $PBS_O_WORKDIR
echo $PBS_ENVIRONMENT
echo $PBS_JOBID
echo $PBS_JOBNAME

# PBS_NODEFILE contains the names of assigned compute nodes.
cat $PBS_NODEFILE

Submit this job:

$ qsub env.sub

Multiple Node

This section illustrates various requests for one or multiple compute nodes and ways of allocating the processor cores on these compute nodes. Each example submits a job submission file (myjobsubmissionfile.sub) to a batch session. The job submission file contains a single command cat $PBS_NODEFILE to show the names of the compute node(s) allocated. The list of compute node names indicates the geometry chosen for the job:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE

All examples use the default queue of the cluster.

One processor core on any compute node

A job shares the other resources, in particular the memory, of the compute node with other jobs. This request is typical of a serial job:

$ qsub -l nodes=1 myjobsubmissionfile.sub

Compute node allocated:

rossmann-a639

Two processor cores on any compute nodes

This request is typical of a distributed-memory (MPI) job:

$ qsub -l nodes=2 myjobsubmissionfile.sub

Compute node(s) allocated:

rossmann-a639
rossmann-a638

All processor cores on one compute node

The option ppn can not be larger than the number of cores on each compute node on the machine in question. This request is typical of a shared-memory (OpenMP) job:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile.sub

Compute node allocated:

rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637
rossmann-a637

All processor cores on any two compute nodes

The option ppn can not be larger than the number of processor cores on each compute node on the machine in question. This request is typical of a hybrid (distributed-memory and shared-memory) job:

$ qsub -l nodes=2:ppn=24 myjobsubmissionfile.sub

Compute nodes allocated:

rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639
rossmann-a639

rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638
rossmann-a638

Multinode geometry from option nodes is one processor core per node (scattered placement)

$ qsub -l nodes=8 myjobsubmissionfile.sub

rossmann-a001
rossmann-a003
rossmann-a004
rossmann-a005
rossmann-a006
rossmann-a007
rossmann-a008
rossmann-a009

Multinode geometry from option procs is one or more processor cores per node (free placement)

$ qsub -l procs=8 myjobsubmissionfile.sub

The placement of processor cores can range from all on one compute node (packed) to all on unique compute nodes (scattered). A few examples follow:

rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a001

rossmann-a001
rossmann-a001
rossmann-a001
rossmann-a002
rossmann-a002
rossmann-a003
rossmann-a004
rossmann-a004

rossmann-a000
rossmann-a001
rossmann-a002
rossmann-a003
rossmann-a004
rossmann-a005
rossmann-a006
rossmann-a007

Four compute nodes, each with two processor cores

$ qsub -l nodes=4:ppn=2 myjobsubmissionfile.sub

rossmann-a001
rossmann-a001
rossmann-a003
rossmann-a003
rossmann-a004
rossmann-a004
rossmann-a005
rossmann-a005

Eight processor cores can come from any four compute nodes

$ qsub -l nodes=4 -l procs=8 myjobsubmissionfile.sub

rossmann-a001
rossmann-a001
rossmann-a003
rossmann-a003
rossmann-a004
rossmann-a004
rossmann-a005
rossmann-a005

Exclusive access to one compute node, using one processor core

Achieving this geometry requires modifying the job submission file, here named myjobsubmissionfile.sub:

#!/bin/sh -l
# FILENAME:  myjobsubmissionfile.sub

cat $PBS_NODEFILE
uniq <$PBS_NODEFILE >nodefile
echo " "
cat nodefile

To gain exclusive access to a compute node, specify all processor cores that are physically available on a compute node:

$ qsub -l nodes=1:ppn=24 myjobsubmissionfile.sub

rossmann-a005
rossmann-a005
...
rossmann-a005

rossmann-a005

This request is typical of a serial job that needs access to all of the memory of a compute node.

Specific Types of Nodes

You may also request that a job be run on specific nodes based on various quantities such as node memory.

These examples submit a job submission file, here named myjobsubmissionfile.sub, to the default queue. The job submission file contains a single command (cat $PBS_NODEFILE) to show the allocated node(s).

Example: a job requires a node with 96 GB of memory:

$ qsub -l nodes=1:pmem=96G myjobsubmissionfile.sub 

Node allocated:

rossmann-b000

Example: a job requires a node with 192 GB of memory:

$ qsub -l nodes=1:pmem=192G myjobsubmissionfile.sub 

Node allocated:

rossmann-c000

Interactive Job

Interactive jobs can run on compute nodes. You can start interactive jobs either with specific time constraints (walltime=hh:mm:ss) or with the default time constraints of the queue to which you submit your job. PBS assigns to all jobs, even interactive jobs, the maximum wall time of their queue.

If you request an interactive job without a wall time option, PBS assigns to your job the default wall time limit for the queue to which you submit. If this is shorter than the time you actually need, your job will terminate before completion. If, on the other hand, this time is longer than what you actually need, you are effectively withholding computing resources from other users. For this reason, it is best to always pass a reasonable wall time value to PBS for interactive jobs.

Once your interactive job starts, you may use that connection as an interactive shell and invoke whatever other programs or other commands you wish. To submit an interactive job with one minute of wall time, use the -I option to qsub:

$ qsub -I -l walltime=00:01:00
qsub: waiting for job 100.rossmann-adm.rcac.purdue.edu to start
qsub: job 100.rossmann-adm.rcac.purdue.edu ready

If you need to use a remote X11 display from within your job (see the SSH X11 Forwarding Section), add the -v DISPLAY option to qsub as well:

$ qsub -I -l walltime=00:01:00 -v DISPLAY
qsub: waiting for job 101.rossmann-adm.rcac.purdue.edu to start
qsub: job 101.rossmann-adm.rcac.purdue.edu ready

To quit your interactive job:

logout

Serial

A serial job is a single process whose steps execute as a sequential stream of instructions on one processor core.

This section illustrates how to use PBS to submit to a batch session one of the serial programs compiled in the section Compiling Serial Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

Suppose that you named your executable file serial_hello. Prepare a job submission file with an appropriate filename, here named serial_hello.sub:

#!/bin/sh -l
# FILENAME:  serial_hello.sub

module load devel
cd $PBS_O_WORKDIR

./serial_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

Submit the serial job to the default queue on Rossmann and request 1 compute node with 1 processor core and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00 ./serial_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
serial_hello
serial_hello.c
serial_hello.sub
serial_hello.sub.emyjobid
serial_hello.sub.omyjobid

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:rossmann-a639.rcac.purdue.edu   hello, world

If the job failed to run, then view error messages in the file serial_hello.sub.emyjobid.

If a serial job uses a lot of memory and finds the memory of a compute node overcommitted while sharing the compute node with other jobs, specify the number of processor cores physically available on the compute node to gain exclusive use of the compute node:

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 serial_hello.sub

View results in the output file:

$ cat serial_hello.sub.omyjobid
Runhost:rossmann-a639.rcac.purdue.edu   hello, world

MPI

A message-passing job is a set of processes (often multiple copies of a single process) that take advantage of distributed-memory systems by communicating with each other via the sending and receiving of messages. Work occurs across several compute nodes of a distributed-memory system. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. Open MPI, MPICH2, and MVAPICH2 are three implementations of the MPI-2 standard.

This section illustrates how to use PBS to submit to a batch session one of the MPI programs compiled in the section Compiling MPI Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

The path to relevant MPI libraries is not setup on any run host by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Rossmann, including MPI library packages. Then, to employ one of the available MPI modules, enter the module load command.

Suppose that you named your executable file mpi_hello. Prepare a job submission file with an appropriate filename, here named mpi_hello.sub:

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR

mpiexec -n 48 ./mpi_hello

You can load any MPI library/compiler module that is available on Rossmann (This example uses the recommended library Open MPI).

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

You invoke an MPI program with the mpiexec command. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application and should typically be equal to the total number of processor cores you request from PBS (more on this below).

Submit the MPI job to the default queue on Rossmann and request 2 compute nodes with all 24 processor cores and 24 MPI ranks on each compute node and 1 minute of wall time. This will use two complete compute nodes of the Rossmann cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 ./mpi_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
mpi_hello
mpi_hello.c
mpi_hello.sub
mpi_hello.sub.emyjobid
mpi_hello.sub.omyjobid

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:rossmann-a010.rcac.purdue.edu   Rank:0 of 48 ranks   hello, world
Runhost:rossmann-a010.rcac.purdue.edu   Rank:1 of 48 ranks   hello, world
   ...
Runhost:rossmann-a010.rcac.purdue.edu   Rank:23 of 48 ranks   hello, world
Runhost:rossmann-a011.rcac.purdue.edu   Rank:24 of 48 ranks   hello, world
Runhost:rossmann-a011.rcac.purdue.edu   Rank:25 of 48 ranks   hello, world
   ...
Runhost:rossmann-a011.rcac.purdue.edu   Rank:47 of 48 ranks   hello, world

If the job failed to run, then view error messages in the file mpi_hello.sub.emyjobid.

If an MPI job uses a lot of memory and 24 MPI ranks per compute node overcommit the memory of the compute nodes, specify more compute nodes (MPI ranks) and fewer processor cores on each compute node, while keeping the total number of MPI ranks unchanged.

Submit the job to the default queue with double the number of compute nodes and half the number of processor cores and MPI ranks per compute node (the total number of MPI ranks remains unchanged):

$ qsub -l nodes=4:ppn=12,walltime=00:01:00 ./mpi_hello.sub

View results in the output file:

$ cat mpi_hello.sub.omyjobid
Runhost:rossmann-c010.rcac.purdue.edu   Rank:0 of 48 ranks   hello, world
Runhost:rossmann-c010.rcac.purdue.edu   Rank:1 of 48 ranks   hello, world
   ...
Runhost:rossmann-c010.rcac.purdue.edu   Rank:11 of 48 ranks   hello, world
Runhost:rossmann-c011.rcac.purdue.edu   Rank:12 of 48 ranks   hello, world
Runhost:rossmann-c011.rcac.purdue.edu   Rank:13 of 48 ranks   hello, world
   ...
Runhost:rossmann-c011.rcac.purdue.edu   Rank:23 of 48 ranks   hello, world
Runhost:rossmann-c012.rcac.purdue.edu   Rank:24 of 48 ranks   hello, world
Runhost:rossmann-c012.rcac.purdue.edu   Rank:25 of 48 ranks   hello, world
   ...
Runhost:rossmann-c012.rcac.purdue.edu   Rank:35 of 48 ranks   hello, world
Runhost:rossmann-c013.rcac.purdue.edu   Rank:36 of 48 ranks   hello, world
Runhost:rossmann-c013.rcac.purdue.edu   Rank:37 of 48 ranks   hello, world
   ...
Runhost:rossmann-c013.rcac.purdue.edu   Rank:47 of 48 ranks   hello, world

The example shares the computes nodes with other jobs. This sharing may still overcommit the memory.

To scatter 4 MPI ranks to 4 different compute nodes with each MPI rank having exclusive use of its compute node, apply the Linux command uniq to make a list of unique compute node names:

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile

mpiexec -n 4 -machinefile nodefile ./mpi_hello

$ qsub -l nodes=4:ppn=24,walltime=00:01:00 ./mpi_hello.sub

Runhost: rossmann-a637.rcac.purdue.edu   Rank: 0 of 4 ranks   hello, world
Runhost: rossmann-a636.rcac.purdue.edu   Rank: 1 of 4 ranks   hello, world
Runhost: rossmann-a634.rcac.purdue.edu   Rank: 2 of 4 ranks   hello, world
Runhost: rossmann-a633.rcac.purdue.edu   Rank: 3 of 4 ranks   hello, world

To distribute 8 MPI ranks to 4 different compute nodes with pairs of MPI ranks having exclusive use of their compute nodes, modify the output of uniq with pairs of compute node names:

#!/bin/sh -l
# FILENAME:  rankspernode

# For each unique compute node name, output two copies.
while read LINE; do
    echo $LINE
    echo $LINE
done

#!/bin/sh -l
# FILENAME:  mpi_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE | ./rankspernode >nodefile
                                                              
mpiexec -n 8 -machinefile nodefile ./mpi_hello

$ qsub -l nodes=4:ppn=24,walltime=00:01:00 ./mpi_hello.sub

Runhost: rossmann-a135.rcac.purdue.edu   Rank: 0 of 4 ranks   hello, world
Runhost: rossmann-a135.rcac.purdue.edu   Rank: 1 of 4 ranks   hello, world
Runhost: rossmann-a136.rcac.purdue.edu   Rank: 2 of 4 ranks   hello, world
Runhost: rossmann-a136.rcac.purdue.edu   Rank: 3 of 4 ranks   hello, world
Runhost: rossmann-a137.rcac.purdue.edu   Rank: 4 of 4 ranks   hello, world
Runhost: rossmann-a137.rcac.purdue.edu   Rank: 5 of 4 ranks   hello, world
Runhost: rossmann-a138.rcac.purdue.edu   Rank: 6 of 4 ranks   hello, world
Runhost: rossmann-a138.rcac.purdue.edu   Rank: 7 of 4 ranks   hello, world

Notes

  • In general, the exact order in which MPI ranks output similar write requests to an output file is random.
  • When you use mpiexec, PBS will cleanly kill tasks that exceed their assigned limits of CPU time, wall clock time, memory usage, or disk space.
  • You can use mpiexec to enforce a security policy. If all jobs are required to startup using mpiexec and the PBS execution environment, it is not necessary to enable rsh or ssh access to the compute nodes in the cluster.
  • Use qstat -q to determine which queues are available. The name of the queue which is available to everyone on Rossmann is "standby".
  • Invoking an MPI program on Rossmann with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke an MPI program.

For an introductory tutorial on how to write your own MPI programs:

OpenMP

A shared-memory job is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over several processor cores of a multi-core processor. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables.

This section illustrates how to use PBS to submit to a batch session one of the OpenMP programs, either task parallelism or loop-level (data) parallelism, compiled in the section Compiling OpenMP Programs. There is no difference in running a Fortran, C, or C++ OpenMP program after compiling and linking it into an executable file.

The OpenMP runtime library automatically creates the optimal number of threads for execution in parallel on the multiple processor cores of a compute node. If you are running the program on a system with only one processor, you will not see any speedup. In fact, the program may run more slowly due to the overhead in the synchronization code generated by the compiler. For best performance, the number of threads should typically be equal to the number of processor cores you will be using.

When running OpenMP programs, all threads should be on the same compute node to take advantage of shared memory.

To run an OpenMP program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS mynumberofthreads

In bash:

$ export OMP_NUM_THREADS=mynumberofthreads

You should also set the environment variable PARALLEL to 1. This variable must be set or else any timers used by the program will return incorrect timings (see the etime man page for more details).

Suppose that you named your executable file omp_hello. Prepare a job submission file with an appropriate name, here named omp_hello.sub:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=24

./omp_hello

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the program.

Submit the OpenMP job to the default queue on Rossmann and request 1 complete compute node with all 24 processor cores (OpenMP threads) on the compute node and 1 minute of wall time. This will use one complete compute node of the Rossmann cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 omp_hello.sub

View two new files in your directory (.o and .e):

$ ls -l
omp_hello
omp_hello.c
omp_hello.sub
omp_hello.sub.emyjobid
omp_hello.sub.omyjobid

View the results from one of the sample OpenMP programs about task parallelism:

$ cat omp_hello.sub.omyjobid
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   hello, world
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

If the job failed to run, then view error messages in the file omp_hello.sub.emyjobid.

If an OpenMP program uses a lot of memory and 24 threads overcommit the memory of the compute node, specify fewer processor cores (OpenMP threads) on that compute node.

Modify the job submission file omp_hello.sub to use half the number of processor cores:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=12

./omp_hello

Submit the job to the default queue with half the number of processor cores:

$ qsub -l nodes=1:ppn=12,walltime=00:01:00 omp_hello.sub

View the results from one of the sample OpenMP programs about task parallelism and using half the number of processor cores:

$ cat omp_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

To retain exclusive use of a compute node while using fewer OpenMP threads than the number of processor cores physically available on that compute node:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=8
uniq <$PBS_NODEFILE >nodefile

./omp_hello

$ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_hello.sub

SERIAL REGION:     Runhost:rossmann-a639.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:2 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:3 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:4 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:5 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:6 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a639.rcac.purdue.edu   Thread:7 of 8 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a639.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

Practice submitting the sample OpenMP program about loop-level (data) parallelism:

#!/bin/sh -l
# FILENAME:  omp_loop.sub

module load devel
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=24

./omp_loop

$ qsub -l nodes=1:ppn=24,walltime=00:01:00 omp_loop.sub

SERIAL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   Iteration:0  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 24 threads   Iteration:1  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   Iteration:2  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:1 of 24 threads   Iteration:3  hello, world
   ...
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   Iteration:46  hello, world
PARALLEL LOOP:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:23 of 24 threads   Iteration:47  hello, world
SERIAL REGION:   Runhost:rossmann-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

Hybrid

A hybrid job combines both message-passing and shared-memory attributes to take advantage of distributed-memory systems with multi-core processors. Work occurs across several compute nodes of a distributed-memory system and across the processor cores of the multi-core processors.

This section illustrates how to use PBS to submit to a batch session one of the hybrid programs compiled in the section Compiling Hybrid Programs. There is no difference in running a Fortran, C, or C++ hybrid program after compiling and linking it into an executable file.

The path to relevant MPI libraries is not setup on any run host by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Rossmann, including MPI library packages. Then, to employ one of the available MPI modules, enter the module load command.

The OpenMP runtime library automatically creates the optimal number of threads for execution in parallel on the multiple processor cores of a compute node. If you are running the program on a system with only one processor, you will not see any speedup. In fact, the program may run more slowly due to the overhead in the synchronization code generated by the compiler. For best performance, the number of threads should typically be equal to the number of processor cores you will be using.

When running hybrid programs, use all processor cores of the compute nodes to take advantage of shared memory.

To run a hybrid program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

In csh:

$ setenv OMP_NUM_THREADS mynumberofthreads

In bash:

$ export OMP_NUM_THREADS=mynumberofthreads

You should also set the environment variable PARALLEL to 1. This variable must be set or else any timers used by the program will return incorrect timings (see the etime man page for more details).

Suppose that you named your executable file hybrid_hello. Prepare a job submission file with an appropriate filename, here named hybrid_hello.sub:

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=24 

mpiexec -n 2 -machinefile nodefile ./hybrid_hello

You can load any MPI library/compiler module that is available on Rossmann. This example uses the recommended library Open MPI.

Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

You invoke a hybrid program with the mpiexec command. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application (more on this below).

Submit the hybrid job to the default queue on Rossmann and request 2 compute nodes with 1 MPI rank and all 24 processor cores (OpenMP threads) on each compute node and 1 minute of wall time. This will use two complete compute nodes of the Rossmann cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 hybrid_hello.sub
179168.rossmann-adm.rcac.purdue.edu

View two new files in your directory (.o and .e):

$ ls -l
hybrid_hello
hybrid_hello.c
hybrid_hello.sub
hybrid_hello.sub.emyjobid
hybrid_hello.sub.omyjobid

View the results from one of the sample hybrid programs about task parallelism:

$ cat hybrid_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

If the job failed to run, then view error messages in the file hybrid_hello.sub.emyjobid.

If a hybrid job uses a lot of memory and 24 OpenMP threads per compute node overcommit the memory of the compute nodes, specify more compute nodes (MPI ranks) and fewer processor cores (OpenMP threads) on each compute node.

Prepare a job submission file with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

#!/bin/sh -l
# FILENAME:  hybrid_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=12

mpiexec -n 4 -machinefile nodefile ./hybrid_hello

Submit the job to the default queue on Rossmann with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

$ qsub -l nodes=4:ppn=12,walltime=00:01:00 hybrid_hello.sub

View the results from one of the sample hybrid programs about task parallelism with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

$ cat hybrid_hello.sub.omyjobid

SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 12 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:1 of 12 threads   hello, world
   ...
PARALLEL REGION:   Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:11 of 12 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a023.rcac.purdue.edu   Rank:3 of 4 ranks, Thread:0 of 1 thread    hello, world

To retain exclusive use of compute nodes while using fewer OpenMP threads than the number of processor cores physically available on each compute node:

#!/bin/sh -l
# FILENAME:  omp_hello.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS=12

./omp_hello

$ qsub -l nodes=2:ppn=24,walltime=00:01:00 hybrid_hello.sub 

SERIAL REGION:     Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:2 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:3 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:4 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:5 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:6 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:7 of 8 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a637.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:2 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:3 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:4 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:5 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:6 of 8 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:7 of 8 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a634.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

Practice submitting the sample OpenMP program about loop-level (data) parallelism:

#!/bin/sh -l
# FILENAME:  hybrid_loop.sub

module load devel
cd $PBS_O_WORKDIR
uniq <$PBS_NODEFILE >nodefile 
export OMP_NUM_THREADS=24 

mpiexec -n 2 -machinefile nodefile ./hybrid_loop

$ qsub -l nodes=2:ppn=16,walltime=00:01:00 hybrid_loop.sub


SERIAL REGION:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   Iteration:0   hello, world
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 24 threads   Iteration:1   hello, world
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   Iteration:2   hello, world
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 24 threads   Iteration:3   hello, world
   ...
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   Iteration:46   hello, world
PARALLEL LOOP:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:23 of 24 threads   Iteration:47   hello, world
SERIAL REGION:   Runhost:rossmann-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   Iteration:0   hello, world
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 24 threads   Iteration:1   hello, world
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   Iteration:2   hello, world
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 24 threads   Iteration:3   hello, world
   ...
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   Iteration:46   hello, world
PARALLEL LOOP:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:23 of 24 threads   Iteration:47   hello, world
SERIAL REGION:   Runhost:rossmann-a045.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

Notes

  • In general, the exact order in which MPI processes of a hybrid program output similar write requests to an output file is random.
  • When you use mpiexec, PBS will cleanly kill tasks that exceed their assigned limits of CPU time, wall clock time, memory usage, or disk space.
  • You can use mpiexec to enforce a security policy. If all jobs are required to startup using mpiexec and the PBS execution environment, it is not necessary to enable rsh or ssh access to the compute nodes in the cluster.
  • Use qstat -q to determine which queues are available. The name of the queue which is available to everyone on Rossmann is "standby".
  • Invoking a hybrid program on Rossmann with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke a hybrid program.

Scratch File

Some applications process data stored in a large input data file. The size of this file may be so large that it cannot fit within the quota of a home directory. This file might reside on Fortress or some other external storage medium. The way to process this file on Rossmann is to copy it to your scratch directory where a job running on a compute node of Rossmann may access it.

This section illustrates how to submit a small job which reads a data file which resides on the scratch file system. This example, myprogram.c, displays the name of the compute node which runs the job, the path name of the current working directory, the contents of that directory, and copies the contents of an input scratch file to an output scratch file. Linux commands access system information. To compile this program, see Compiling Serial Programs.

Prepare a scratch file directory with a large input data file:

$ ls -l $RCAC_SCRATCH
total 96
-rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile

Prepare a job submission file with the path to your scratch file directory listed as a command-line argument and with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load devel
cd $PBS_O_WORKDIR

./myprogram $RCAC_SCRATCH

Submit this job to the default queue on Rossmann and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it.

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View two new files in the home directory (.o and .e):

$ ls -l
total 160
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rw------- 1 myusername itap    0 Jun  8 11:05 myjob.sub.e266283
-rw------- 1 myusername itap  780 Jun  8 11:05 myjob.sub.o266283
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram*
-rw-r--r-- 1 myusername itap 3930 Jun  8 11:13 myprogram.c

View one new file in the scratch file directory, bigoutputdatafile:

$ ls -l $RCAC_SCRATCH
total 96
-rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile
-rw-r--r-- 1 myusername itap   42 Jun  8 11:05 mybigoutputdatafile

View results in the output file:

$ cat myjob.sub.o266283
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
rossmann-d036.rcac.purdue.edu
/home/myusername
total 128
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
-rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
total 128
-rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
-rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
-rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
-rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
***  MAIN START  ***

input scratch file:   /scratch/lustreA/m/myusername/mybiginputdatafile
output scratch file:  /scratch/lustreA/m/myusername/mybigoutputdatafile
scratch file system:  textfromscratchfile

***  MAIN  STOP  ***

The output shows the name of the compute node which PBS chose to run the job, the path of the current working directory (the user's home directory), before-and-after listings of the content of the current working directory, and output from the application. The output scratch file named mybigoutdatafile, the primary output of this program, appears in the scratch directory, not the home directory.

/tmp File

Some applications write a large amount of intermediate data to a temporary file during an early part of the process then read that data for further processing during a later part of the process. The size of this file may be so large that it cannot fit within the quota of a home directory or that it requires too much I/O activity between the compute node and either the home directory or the scratch file directory. The way to process this intermediate file on Rossmann is to use the /tmp directory of the compute node which runs the job. Used properly, /tmp may provide faster local storage to an active process than any other storage option.

This section illustrates how to submit a small job which first writes then reads an intermediate data file which resides on the /tmp directory. This example, myprogram.c, displays the contents of the /tmp directory before and after processing. Linux commands access system information. To compile this program, see Compiling Serial Programs.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load devel
cd $PBS_O_WORKDIR

./myprogram

Submit this job to the default queue on Rossmann and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View results in the output file, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
-rw-r--r-- 1 myusername itap 12 Jun 16 11:36 /tmp/mytmpfile
***  MAIN START  ***

/tmp file data:  abcdefghijk

***  MAIN  STOP  ***

The output verifies the existence of the intermediate data file in the /tmp directory.

View results in the error file, myjob.sub.emyjobid:

ls: /tmp/mytmpfile: No such file or directory

The results in the error file verify that the intermediate data file does not exist at the start of processing.

While the /tmp directory can provide faster local storage to an active process than other storage options, you never know how much storage is available in the /tmp directory of the compute node chosen to run your job. If an intermediate data file consistently fails to fit in the /tmp directories of a set of compute nodes, consider limiting the pool of candidate compute nodes to those which can handle your intermediate data file.

Commercial and Third-Party Applications

Several commercial and third-party software packages are available on Rossmann and accessible through PBS.

We try to continually test the examples in the next few sectionss, but you may find some differences. If you need assistance, please contact us.

With the exception of Octave and R, which are free software, only Purdue affiliates may use the following licensed software.

Gaussian

Gaussian is a computational chemistry software package which works on electronic structure. This section illustrates how to submit a small Gaussian job to a PBS queue. This Gaussian example runs the Fletcher-Powell multivariable optimization.

Prepare a Gaussian input file with an appropriate filename, here named myjob.com. The final blank line is necessary:

#P TEST OPT=FP STO-3G OPTCYC=2

STO-3G FLETCHER-POWELL OPTIMIZATION OF WATER

0 1
O
H 1 R
H 1 R 2 A

R 0.96
A 104.

To submit this job, load Gaussian then run the provided script, named subg09. This job uses one compute node with 8 processor cores:

$ module load gaussian09/B.01
$ subg09 myjob -l nodes=1:ppn=8

View job status:

$ qstat -u myusername

View results in the file for Gaussian output, here named myjob.log. Only the first and last few lines appear here:

 Entering Gaussian System, Link 0=/apps/rhel5/g09-B.01/g09/g09
 Initial command:
 /apps/rhel5/g09-B.01/g09/l1.exe /scratch/scratch95/m/myusername/gaussian/Gau-7781.inp -scrdir=/scratch/scratch95/m/myusername/gaussian/
 Entering Link 1 = /apps/rhel5/g09-B.01/g09/l1.exe PID=      7782.
  
 Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2010,
            Gaussian, Inc.  All Rights Reserved.

.
.
.

 Job cpu time:  0 days  0 hours  1 minutes 37.3 seconds.
 File lengths (MBytes):  RWF=      5 Int=      0 D2E=      0 Chk=      1 Scr=      1
 Normal termination of Gaussian 09 at Wed Mar 30 10:49:02 2011.
real 17.11
user 92.40
sys 4.97
Machine:
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389
rossmann-a389

The ppn= specification should be used as in the following. It does not affect the way the job runs, but it makes the #tasks entry in the qstat output appear correctly.

Examples of Gaussian PBS Job Submissions

Submit job using 4 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=4,walltime=200:00:00 -q myqueuename

Submit job using 4 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=4,walltime=200:00:00 -q myqueuename

Submit job using 8 processor cores on a single node:

$ subg09 myjob -l nodes=1:ppn=8,walltime=200:00:00 -q myqueuename

Submit job using 8 processor cores on each of 2 nodes:

$ subg09 myjob -l nodes=2:ppn=8,walltime=200:00:00 -q myqueuename

For more information about Gaussian:

Maple

Maple is a general-purpose computer algebra system. This section illustrates how to submit a small Maple job to a PBS queue. This Maple example differentiates, integrates, and finds the roots of polynomials.

Prepare a Maple input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple
cd $PBS_O_WORKDIR

# Use the -q option to suppress startup messages.
# maple -q myjob.in
maple myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load maple

# Use the -q option to suppress startup messages.
# maple -q << EOF
maple << EOF

# Differentiate wrt x.
diff( 2*x^3,x );

# Integrate wrt x.
int( 3*x^2*sin(x)+x,x );

# Solve for x.
solve( 3*x^2+2*x-1,x );

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
                                         2
                                      6 x

                                                           2
                      2                                   x
                  -3 x  cos(x) + 6 cos(x) + 6 x sin(x) + ----
                                                          2

                                    1/3, -1

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Maple:

Mathematica

Mathematica implements numeric and symbolic mathematics. This section illustrates how to submit a small Mathematica job to a PBS queue. This Mathematica example finds the three roots of a third-degree polynomial.

Prepare a Mathematica input file with an appropriate filename, here named myjob.in:

(* FILENAME:  myjob.in *)

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
cd $PBS_O_WORKDIR

math < myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load mathematica
math << EOF

(* Find roots of a polynomial. *)
p=x^3+3*x^2+3*x+1
Solve[p==0]
Quit

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, here named myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Mathematica 5.2 for Linux x86 (64 bit)
Copyright 1988-2005 Wolfram Research, Inc.
 -- Terminal graphics initialized --

In[1]:=
In[2]:=
In[2]:=
In[3]:=
                     2    3
Out[3]= 1 + 3 x + 3 x  + x

In[4]:=
Out[4]= {{x -> -1}, {x -> -1}, {x -> -1}}

In[5]:=

View the standard error file, myjob.sub.emyjobid:

rmdir: ./ligo/rengel/tasks: Directory not empty
rmdir: ./ligo/rengel: Directory not empty
rmdir: ./ligo: Directory not empty

For more information about Mathematica:

MATLAB (Licenses and Strategies)

MATLAB® (an acronym for MATrix LABoratory) is a general-purpose, high-level programming package which offers a fourth-generation programming language that enables computationally intensive tasks. It integrates a powerful programming language with computation and visualization to provide a flexible environment where problems and solutions appear in familiar mathematical notation. MATLAB allows the integration of external routines written in C, C++, Fortran, and Java with MATLAB applications. Built-in interfaces handle the importing of data from instruments, files, and external databases. MATLAB is a product of The MathWorks, a privately held company founded in 1984.

The MATLAB interpreter is the part of MATLAB which reads M-files and MEX-files and executes MATLAB statements. Simulink® is a graphical environment for simulation and Model-Based Design of multidomain dynamic and embedded systems. The Parallel Computing Toolbox (PCT) parallelizes MATLAB applications. The Distributed Computing Server (DCS) scales up PCT applications to compute clusters. Many other optional, add-on toolboxes (separately available collections of special-purpose MATLAB functions) extend the basic MATLAB package to solve particular classes of problems; they focus on individual areas of science and industry. The MATLAB Compiler™ (mcc) compiles a MATLAB application into a standalone program or software component. The term MATLAB can mean just the interpreter or the entire package.

Industries using MathWorks products include automobile, aerospace, communications, electronics, finance, industrial automation, and medicine. Areas of application include linear algebra and other calculations involving matrices or vectors of data, mathematics, statistics, signal processing, image processing, communications, control design, test and measurement, financial modeling and analysis, computational biology, algorithm development, simulation, data acquisition and analysis and visualization, and numeric and symbolic computation.

Purdue University has a system-wide license agreement with The MathWorks to use MATLAB for the purposes of teaching and research. Purdue's license number, your Purdue email address, and your MATLAB password provide access to MathWorks help desk, webinars, and other materials. matlabroot provides the path to the location where MATLAB is installed including the path to examples. To discover Purdue's license number, version details, and the path to examples:

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> license
819994
>> ver
>> disp(matlabroot)
/apps/rhel5/MATLAB/R2011b
>> quit;
$

MATLAB, Simulink, Compiler, and several of the optional toolboxes are available to faculty, staff, and students. To see the kind and quantity of all MATLAB licenses plus the number that you are currently using:

$ matlab_licenses
                                                  Licenses
MATLAB Product / Toolbox Name          myusername     Free    Total
==================================     ============================
Aerospace Blockset                              0       10       10
Aerospace Toolbox                               0       18       20
Bioinformatics Toolbox                          0       19       20
Communication Toolbox                           0       27       30
Compiler                                        0       14       15
Control Toolbox                                 0       60       75
Curve Fitting Toolbox                           0       37       75
Data Acq Toolbox                                0       10       10
Database Toolbox                                0        5        5
Datafeed Toolbox                                0        5        5
Dial and Gauge Blocks                           0       14       25
Econometrics Toolbox                            0       13       15
Excel Link                                      0        5        5
Financial Toolbox                               0       13       15
Fixed-Point Blocks                              0        5        5
Fixed Point Toolbox                             0       11       20
Fuzzy Toolbox                                   0        9       10
GADS Toolbox                                    0       13       15
Identification Toolbox                          0       15       15
Image Acquisition Toolbox                       0        5        5
Image Toolbox                                   0       61      100
Instr Control Toolbox                           0       10       15
MAP Toolbox                                     0       25       30
MATLAB                                          1      373    1,000
MATLAB Builder for dot Net                      0        1        1
MATLAB Coder                                    0       25       25
MATLAB Distrib Comp Server                      5       12       32
MATLAB Report Gen                               0        2        2
MBC Toolbox                                     0        5        5
MPC Toolbox                                     0        4        5
Neural Network Toolbox                          0       11       15
OPC Toolbox                                     0        1        1
Optimization Toolbox                            0       91      125
Parallel Computing Toolbox                      1       31       50
PDE Toolbox                                     0       13       15
Power System Blocks                             0       21       30
Real-Time Win Target                            0        8       15
Real-Time Workshop                              0        4       25
Robust Toolbox                                  0        5        5
RTW Embedded Coder                              0       15       15
Signal Blocks                                   0       28       30
Signal Toolbox                                  0       53      100
SimBiology                                      0        4        5
SimHydraulics                                   0       15       15
SimMechanics                                    0        4        5
Simscape                                        0       29       30
SIMULINK                                        0       65      100
Simulink Control Design                         0       15       15
Simulink Design Optim                           0        4        5
SIMULINK Report Gen                             0        2        2
SL Verification Validation                      0        4        5
Stateflow                                       0       13       15
Statistics Toolbox                              0       31      100
Symbolic Toolbox                                0       56       75
Virtual Reality Toolbox                         0        5        5
Wavelet Toolbox                                 0       14       15
XPC Target                                      0        9       20

The table shows the kind and quantity of MATLAB licenses which Purdue owns. The second column lists the number of licenses that you are currently using. The third column is a snapshot of the number of licenses currently available. The fourth column shows the total number of licenses which Purdue owns for each product. The table illustrates that while there are many MATLAB licenses, access to toolboxes is limited. Since Purdue's community of MATLAB users shares these licenses, users should plan an effective strategy so as not to prevent others from gaining access to MATLAB resources.

To reduce the table above to how many MATLAB licenses your jobs are using while they are running:

$ matlab_licenses -u
                                                  Licenses
MATLAB Product / Toolbox Name           myusername    Free    Total
==================================      ===========================
MATLAB                                           1     373    1,000
MATLAB Distrib Comp Server                       5      12       32
Parallel Computing Toolbox                       1      31       50

MathWorks expects their customers to use MATLAB interactively on a laptop or desktop. Purdue purchased licenses to run MATLAB on Linux clusters, a world of batch processing. The batch world of Linux clusters is very different from the interactive world of laptops. This difference requires another approach when applying MATLAB to large and compute-intensive applications, since you share resources on the clusters. Consider the analogy of the book. You can buy your personal copy of a book, or you can use a copy from a library. You can buy the compute cycles of a laptop, or you can use the compute cycles of a cluster. You can buy a MATLAB license, or you can use a MATLAB license on one of Purdue's community clusters.

In this shared environment, you must act like a "nice" user. There are two opportunities to be a "nice" user.

First, consider where you run your MATLAB client. Yes, you can log on a front end of a cluster, use the module feature to load a MATLAB client, and interact with your MATLAB client via the command line prompt much like those who run MATLAB on a laptop. Purdue allows application development on the front end of a cluster. Once you finish development and you are ready to move your application to production, Purdue asks that you run your MATLAB application on the compute nodes of a cluster. This means either running your MATLAB client on a front end and using the MATLAB functions batch() or submit() or running your MATLAB client on a compute node. The latter method involves PBS and its qsub command to send a script to a compute node which runs MATLAB. It also involves making available any related M-files and data files to the compute nodes chosen to run your job. These methods avoid tying up the front end and preventing other users from accomplishing their development. So, this is one way to be a "nice" user.

The second opportunity to be a "nice" user is to consider how many MATLAB licenses which your job requires and for how long. When you run a MATLAB client, you are using one MATLAB license. When you run a MATLAB parfor loop or a MATLAB spmd statement in a MATLAB pool job, you are using at least one additional license which comes from the Parallel Computing Toolbox. Running this job in the local configuration requires no additional license. If you wish to use a scheduler like PBS to submit a MATLAB pool job to a compute node of a cluster, then you use yet more licenses which come from the MATLAB Distributed Computing Server. Running a MATLAB pool job with four MATLAB workers (labs) requires seven licenses (one MATLAB, one PCT, and 5 DCS licenses). At some point after development and before production, you should consider ways to reduce how many licenses your jobs use, for example using the local configuration or compiling your application. MathWorks allows linking MATLAB libraries to your compiled applications so that your jobs may run without using any MATLAB license. Also, MathWorks permits distributing standalone executables and software components royalty-free.

MATLAB distinguishes three types of jobs (and three corresponding constructors): distributed (createJob()), pool (createMatlabPoolJob()), and parallel (createParallelJob()). A distributed job is one or more independent, single-processor-core tasks of MATLAB statements. Tasks may be identical or different; however, they do not interact with each other, and they need not run simultaneously. Tasks are distributed to workers as the workers become available, so a worker might process one or more tasks in succession. A serial job is just a distributed job with a single task which one processor core executes once. Typically, distributed jobs run parameter sweeps (running the same code with different inputs). A distributed job is also known as a task-parallel or embarrassingly parallel job.

A pool job involves code that requires one of the workers to distribute work to the other workers. One worker oversees the work accomplished by the other workers. The parfor and spmd statements of a pool job are similar to the parallel loop and parallel region, respectively, of the OpenMP Standard. Typically, a pool job implements a for loop whose iterations are independent, many, and long running. A pool job can also implement codistributed arrays as a means of handling data arrays which are too large to fit into the memory of any one compute node.

A parallel job is a single task running concurrently (in parallel) on two or more processor cores. The copies of the task are not independent; they may interact with each other. It is similar to a program running the Message-Passing Interface (MPI Standard). A parallel job is also known as a data-parallel job.

A MATLAB program may call user code written in C, C++, or Fortran (MEX file). The reverse is also true. A user program written in C, C++, or Fortran may call MATLAB functions or user-defined functions written in the MATLAB language (standalone program). MATLAB also offers a compiler which allows you to share your MATLAB aplications as an executable or a shared library with end users outside the MATLAB environment.

A few core concepts of MATLAB organize your strategy when developing and submitting MATLAB jobs to a Linux compute cluster. The term MATLAB client simply refers to a running copy of MATLAB. A client may run on a front end or on a compute node. The location of a client is important since it can affect the kind and quantity of MATLAB licenses needed to run a job.

MATLAB has two kinds of schedulers: the 'local' scheduler and an installation specific scheduler. In Purdue's case, the latter is named 'torque'. The 'local' scheduler runs a MATLAB job on the processor core(s) of the same compute node that is running the client (either a front end or a compute node of the cluster). Development work may occur on a front end. The 'torque' scheduler runs a MATLAB job on compute node(s) different from the node running the client. When using either scheduler, you typically use the submit() function and a sequence of related functions to setup the details of your job submission. When using the 'torque' scheduler, you may specify options that usually appear on the PBS qsub command. Production work should occur on compute node(s), not on the front end.

MATLAB offers two kinds of configurations: the 'local' configuration and user-defined configurations. The 'local' configuration runs a MATLAB job on the processor core(s) of the same compute node that is running the client (either a front end or a compute node of the cluster). Development work may occur on a front end. To run a MATLAB job on compute node(s) different from the node running the client, you must define your own configurations with the Configuration Manager. You find the Configuration Manager in the Parallel menu; MATLAB offers no set of functions equivalent to the Configuration Manager. When using either configuration, you typically use the batch() function to setup the details of your job submission. When using your own configuration, you may specify options that usually appear on the PBS qsub command. If your application runs best with a different compute node topology when you provide different initial conditions, you may be submitting that job with more than one user-defined configuration. Production work should occur on compute node(s), not on the front end.

Once your project is ready for production, your strategy becomes either compiling the MATLAB code into an executable file or using the Coder to generate standalone C and C++ code from MATLAB® code. The generated source code is portable and readable. The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. Each time you issue the compiler command, you reset the 30-minute timer. When running mcc at the MATLAB prompt, you hold the license as long as MATLAB remains open. To release the license, quit MATLAB. Quitting MATLAB is necessary to release the license, but not sufficient. When you quit MATLAB before the 30-minute timer runs out, then the license remains checked out for the remaining time. When running mcc at the Linux prompt, the 30-minute timer starts or restarts. The license manager tracks MATLAB licenses per user per cluster. If you run the MATLAB Compiler on two different clusters, you use two Compiler licenses. To minimize your license usage, run the Compiler on on cluster.

The following sections provide several examples illustrating how to submit MATLAB jobs to a Linux compute cluster. They also explain the kind and quantity of MATLAB licenses for each method. When developing your application, use a method that submits your MATLAB to a compute node while using MATLAB's 'local' configuration. This avoids competing for the limited number of DCS licenses. When running your application in production mode, use the Compiler or Coder, use compute nodes, not the front end, and use the minimal number of MATLAB licenses possible.

Finally, MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. This is different from the explicit parallelism of the Parallel Computing Toolbox. When you know that you are developing a serial job and you are unsure whether you are calling one of MATLAB's thread-parallel enabled functions, run MATLAB with implicit parallelism turned off: -singleCompThread. When you know that you want to use a thread-parallel enabled function for its parallelism, request exclusive use of a node by setting ppn= to the number of processor cores physically available on the compute node of a cluster.

For more information about MATLAB:

MATLAB (Configuration Manager)

Use the Configuration Manager in the Parallel menu to prepare your PBS configuration. This configuration contains the PBS details (queue, nodes, ppn, walltime, etc.) of your job submission. Ultimately, your PBS configuration will be an argument of the MATLAB functions batch() or FindResource(). Alternatively, you can make your PBS configuration the default configuration which function batch() reads during job submission. If you have several applications and each requires different PBS command-line options, then each application may have its own configuration. To make your PBS configuration, load a MATLAB module on a front end and verify the version of the MATLAB module loaded. Run a MATLAB client with the desktop showing. First, discover the current list of configurations; most likely, just 'local'. Then select the Parallel menu:

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay -singleCompThread
>> [current all] = defaultParallelConfig
>> disp(current)
local
>> disp(all)
    'local'
>>
Parallel
Manage Configurations

In the Configurations Manager dialog box, select File:

File
New
torque

Enter properties as needed. Here are a few suggestions.

Enter a Configuration name:

mypbsconfig

In the Jobs tab, enter an appropriate value for the minimum and maximum number of workers.

ClusterMatlabRoot (use the path of the chosen version of MATLAB):

apps/rhel5/MATLAB_R2010a
apps/rhel5/MATLAB/R2010b
apps/rhel5/MATLAB/R2011b

ClusterSize (the number of DCS licenses available):

128

ResourceTemplate (PBS command-line options):

-l nodes=^N^

SubmitArguments (PBS command-line options):

-q myqueuename -l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+^N^

ClusterOsType:

unix

HasSharedFileSystem:

True

RshCommand:

ssh

In the Jobs tab, enter an appropriate value for the minimum and maximum number of workers.

Export your PBS configuration:

OK
Right-click the name of the PBS configuration
Export
File
New
Save

MATLAB (Interpreting an M-file)

The MATLAB interpreter is the part of MATLAB which reads M-files and MEX-files and executes MATLAB statements.

This section illustrates five methods about submitting a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and gets the three random numbers. The system function hostname returns two values: a code and the run host name.

The first method runs on a front end a MATLAB client which runs the MATLAB batch() function with a PBS configuration. Function batch() is a wrapper for function submit(). Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out three licenses: one MATLAB license for the client running on the front end, one PCT license, and one DCS license. The MATLAB license remains active between running and quitting MATLAB. The PCT license remains active between running a PCT function, such as batch(), and quitting MATLAB. The DCS license remains active between running function batch() and job completion. The DCS license does not appear in the output of function license().

The second method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as batch(), and quitting MATLAB. This job is completely off the front end.

The third method runs on a front end a MATLAB client which runs the MATLAB submit() function with the 'torque' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'torque' scheduler, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out three licenses: one MATLAB license for the client running on the front end, one PCT license, and one DCS license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. The DCS license remains active between running function submit() and job completion. The DCS license does not appear in the output of function license().

The fourth method uses the PBS qsub command to submit to a compute node a MATLAB client which runs the MATLAB submit() function with the 'local' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'local' scheduler, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The fifth method uses the PBS qsub command to submit to a compute node a MATLAB client which interprets an M-file with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter; so, it requires and checks out one license: one MATLAB license for the client running on a compute node. The MATLAB license remains active between starting and quitting MATLAB. This job is completely off the front end.

The following table summarizes MATLAB license usage:

Method Description MATLAB PCT DCS mcc Limitations
1 batch() with user-defined PBS configuration 1 1 1 0 number of MATLAB,PCT,DCS licenses purchased
2 batch() with 'local' configuration, qsub 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
3 submit() with 'torque' scheduler 1 1 1 0 number of MATLAB,PCT,DCS licenses purchased
4 submit() with 'local' scheduler, qsub 1 1 0 0 local scheduler with 8 (R2009a) and 12 (R2011a) workers
5 qsub 1 0 0 0 number of MATLAB licenses purchased

Prepare a MATLAB serial program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name)

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

The function M-file returns a single value: a concatenation of the name of the compute node which runs the function and the three random numbers.

For the first method of job submission, use a MATLAB M-file (MATLAB function batch() accepts either a script M-file or a function M-file).

At the MATLAB prompt, discover which MATLAB licenses are in use. View the current default parallel configuration; most likely, it is the 'local' configuration. Call MATLAB function batch() to run the MATLAB code in the file myscript.m with your PBS configuration which you previously designed with the Configuration Manager (using the 'local' configuration at this point will run the MATLAB program on the front end). Also, capture job output in the diary. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). While your job is running, get a list of the licenses in use. The list includes the MATLAB license and the PCT license. The DCS license of the worker does not show. After your job finishes, verify that the PCT license remains in use. View results by either viewing the diary, loading the job, or getting all output arguments into a cell array. When you do not want to keep the files of your job, use MATLAB function destroy() to erase them from your current working directory. After this job finishes, the default parallel configuration remains the 'local' configuration.

>> license('inuse')
matlab
>> disp(defaultParallelConfig);
local
>> job = batch('myscript','Configuration','mypbsconfig','CaptureDiary',true);
>> !qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
98237.rossmann-ad myusername standby  Job1Task1     --    1   1    --  00:01 Q   -- 
>> 
>> disp(job.get('State'))
queued
>> disp(job.get('State'))
running
>> license('inuse')
distrib_computing_toolbox
matlab
>> disp(job.get('State'))
finished
>> license('inuse')
distrib_computing_toolbox
matlab
>> job.diary


hostname:rossmann-a000.rcac.purdue.edu

0.9173 0.6839 0.8661

>> who       

Your variables are:

ans  job  

>> job.load
>> who

Your variables are:

A     ans   c     job   name 
 
>> disp(name)
hostname:rossmann-a000.rcac.purdue.edu

>> disp(A)
    0.9173    0.6839    0.8661

>> result = getAllOutputArguments(job);
>> result{1}

ans = 

       A: [0.9173 0.6839 0.8661]
     ans: 'local'
       c: 0
    name: hostname:rossmann-a000.rcac.purdue.edu


>> disp(result{1}.name)
hostname:rossmann-a000.rcac.purdue.edu

>> disp(result{1}.A)
    0.9173    0.6839    0.8661

>> ls -l
>> job.destroy;
>> ls -l
>> disp(defaultParallelConfig);
local
>> quit;
$

The license name distrib_computing_toolbox refers to the Parallel Computing Toolbox.

Function qstat shows that the MATLAB client submitted this job as one compute node (NDS) with one processor core (TSK) and with the wall time of one minute.

Output demonstrates three ways to access the results: diary, load, and getAllOutputArguments(). Output shows the name of the compute node (a000) which processed the file myscript.m and three random numbers.

After calling the MATLAB function batch(), you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with the PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy(), so you may rerun MATLAB and find your job:

$ module load matlab/R2011b
$ matlab -nodisplay -singleCompThread
>> sched=findResource('scheduler','Configuration','mypbsconfig');
>> job=findJob(sched,'State','finished');
>> job.diary
>> job.load
>> name
>> A
>> result = getAllOutputArguments(job);
>> result{1}.name
>> result{1}.A
>> destroy(job);
>> quit
$

To apply the first method of job submission to a function M-file, use one of the following sequences:

>> job=batch('myfunction','Configuration','mypbsconfig','CaptureDiary',true);
>> disp(job.get('State'))
finished
>> job.diary
>> job.load
>> ans
>> result=getAllOutputArguments(job);
>> result{1}.ans

>> job=batch('myfunction',1,{},'Configuration','mypbsconfig');                    
>> disp(job.get('State'))
finished
>> result=getAllOutputArguments(job);
>> result{1}

>> job=batch(@myfunction,1,{},'Configuration','mypbsconfig');
>> disp(job.get('State'))
finished
>> result=getAllOutputArguments(job);
>> result{1}

Function batch() offers an alternate mode of operation. If you exclude the property Configuration and its value from the list of arguments of function batch(), then batch() reads the default parallel configuration. Before calling batch(), set the default parallel configuration with your PBS configuration which you previously designed with the Configuration Manager. This change of the default parallel configuration is immediate and permanent; the PBS configuration remains the default parallel configuration during subsequent runs of MATLAB. MATLAB function batch() reads the default parallel configuration to discover your instructions about submission.

To scale up this method to handle a real application, use the Configuration Manager in the Parallel menu to increase the wall time to accommodate a longer running job. Enter a new wall time in the property SubmitArguments.

The second method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first method since it uses the same MATLAB M-file, either myscript.m or myfunction.m. This method differs from the first because the MATLAB client runs on a compute node rather than on the front end, and the client uses the MATLAB 'local' configuration rather than a user-defined PBS configuration.

Prepare a MATLAB script M-file that calls MATLAB function batch() which specifies the 'local' configuration and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

job=batch('myscript','Configuration','local','CaptureDiary',true);
job.wait;
job.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR 
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator 
matlab -nodisplay -singleCompThread -r mylclbatch

Submit the job as a single compute node with two processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=2,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m; one processor core runs the MATLAB M-file.

View job status:

$ qstat -u myusername

radon-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.radon-ad myusername      standby  myjob.sub   30197   1   2    --  00:01 R 00:00

Output shows two processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


mylclbatch.m
rossmann-a639.rcac.purdue.edu


hostname:rossmann-a639.rcac.purdue.edu

0.917276 0.683883 0.866076

Output shows that processor cores on one compute node (a639) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch script myscript.m. Output also displays three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

job=batch('myfunction','Configuration','local','CaptureDiary',true);
job.wait;
job.diary

job=batch('myfunction',1,{},'Configuration','local');
job.wait;
result = getAllOutputArguments(job);
result{1}

job=batch(@myfunction,1,{},'Configuration','local');
job.wait;
result = getAllOutputArguments(job);
result{1}

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

For the third method of job submission, use the MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file).

Prepare a MATLAB script M-file which finds the MATLAB 'torque' scheduler, which specifies one minute of wall time for this short job and the number of PCT and DCS licenses, which defines a MATLAB serial job and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler', 'type', 'torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+1');
job = createJob(sched);
set(job,'FileDependencies',{'myfunction.m'});
task = createTask(job,@myfunction,1,{});
submit(job);
disp('FINISHED SUBMITTING')

On a front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, run mypbssubmit.m. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, get all output arguments into a cell array and display the results. When you do not want to keep the files of a job, use MATLAB function destroy() to erase them from your current working directory.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay -singleCompThread
>> mypbssubmit
FINISHED SUBMITTING
>> !qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
98238.rossmann-ad myusername standby  Job2Task1   26391   1   1    --  00:01 Q 00:00
>> disp(job.get('State'))
queued
>> disp(job.get('State'))
running
>> disp(job.get('State'))
finished
>> result = getAllOutputArguments(job)

result =

    [2x37 char]

>> result{1}

ans =

hostname:rossmann-a639.rcac.purdue.edu

0.917276 0.683883 0.866076 

>> ls -l
>> job.destroy;
>> ls -l
>> quit
$

Function qstat shows that the MATLAB client submitted this job as one compute node (NDS) with one processor core (TSK) and the requested wall time of one minute.

Output shows the name of the compute node (a639) which processed the file myfunction.m. Output also displays the three random numbers.

After the "FINISHED SUBMITTING" message, you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy(), so you may rerun MATLAB and find your job:

$ module load matlab/R2011b
$ matlab -nodisplay -singleCompThread
>> sched=findResource('scheduler','type','torque');
>> job=findJob(sched,'State','finished');
>> result = getAllOutputArguments(job);
>> result{1}
>> job.destroy;
>> quit
$

Function submit() offers an alternate mode of operation. It can read your PBS configuration made previously with the Configuration Manager. Here is an alternate version of mypbssubmit.m which relies on mypbsconfig for the several details of job submission:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','configuration','mypbsconfig');
job = createJob(sched);
set(job,'FileDependencies',{'myfunction.m'});
task = createTask(job,@myfunction,1,{});
submit(job);
disp('FINISHED SUBMITTING');

To scale up this method to handle a real application, increase the wall time in mypbssubmit.m to accommodate a longer running job.

The fourth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar the third method since it uses the same MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file). This method differs from the third because the MATLAB client runs on a compute node rather than on the front end, and the client uses the MATLAB 'local' scheduler rather than the MATLAB 'torque' scheduler.

Prepare a MATLAB script M-file which finds the MATLAB 'local' scheduler, which defines a MATLAB serial job and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclsubmit.m

!echo "mylclsubmit.m"
!hostname

sched = findResource('scheduler', 'type', 'local');
set(sched,'ClusterMatlabRoot',matlabroot);
job = createJob(sched);
set(job,'FileDependencies',{'myfunction.m'});
task = createTask(job,@myfunction,1,{});
submit(job);
disp('FINISHED SUBMITTING')

job.wait;
result = getAllOutputArguments(job);
result{1}
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r mylclsubmit

Submit the job as a single node with two processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=2,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclsubmit.m; one processor core runs the MATLAB function.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
97986.rossmann-ad myusername      standby  myjob.sub    4645   1   2    --  00:01 R 00:00

Output shows two processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


mylclsubmit.m
rossmann-a639.rcac.purdue.edu
FINISHED SUBMITTING

ans =

rossmann-a639.rcac.purdue.edu

0.917276 0.683883 0.866076

Output shows that processor cores on one compute node (a639) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch script myfunction.m. Output also displays three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

The fifth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the second and fourth methods since it uses either a MATLAB script M-file or a MATLAB function M-file. Like the second and fourth methods, this method runs the MATLAB client on a compute node rather than on the front end. This places the 'local' configuration on the compute node, rather than on the front end. This allows using the 'local' configuration rather than a user-defined configuration to run a MATLAB program on a compute node. What is different is that the MATLAB script M-file must quit; the function M-file requires no change.

Modify the MATLAB script M-file myscript.m with a quit statement. The MATLAB function M-file myfunction.m needs no change:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name);

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r myscript
# matlab -nodisplay -singleCompThread -r myfunction

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
matlab -nodisplay -singleCompThread << EOF

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name)

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;
EOF     % end of MATLAB code

Submit the job as a single compute node with one processor core:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
97986.rossmann-ad myusername      standby  myjob.sub    4645   1   1    --  00:01 R 00:00

Output shows one compute node (NDS) with one processor core (TSK).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


hostname:rossmann-a639.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows that a processor core on one compute node (a639) processed the entire job. One processor core processed myjob.sub and myscript.m. Output also displays the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

For more information about MATLAB:

MATLAB Compiler (Compiling an M-file)

The MATLAB Compiler translates an M-file into a standalone application or software component. A compiled version of an M-file can substantially improve performance of MATLAB code, especially for statements like for and while. The MATLAB Compiler Runtime (MCR) is a standalone set of shared libraries. Together, compiling and the MCR enable the execution of MATLAB files, even outside the MATLAB environment. While you do need to purchase a MATLAB Compiler license to build an executable, you may freely distribute the executable and the MCR to as many colleagues and computers as desired without license restrictions.

This section illustrates the sixth method about submitting a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and computes the inverse of a matrix. The system function hostname returns two values: a code and the run host name.

The sixth method of job submission uses the MATLAB Compiler mcc to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. During compilation, the default configuration may be either the 'local' configuration or your PBS configuration; the results will be the same. This job is completely off the front end.

The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. Each time you issue the compiler command, you reset the 30-minute timer. When running mcc at the MATLAB prompt, you hold the license as long as MATLAB remains open. To release the license, quit MATLAB. Quitting MATLAB is necessary to release the license, but not sufficient. When you quit MATLAB before the 30-minute timer runs out, then the license remains checked out for the remaining time. When running mcc at the Linux prompt, the 30-minute timer starts or restarts. The license manager tracks MATLAB licenses per user per cluster. If you run the MATLAB Compiler on two different clusters, you use two Compiler licenses. To minimize your license usage, run the Compiler on one cluster.

Unlike compilers of typical programming languages like C, C++, and Fortran, the MATLAB Compiler does not generate machine executable code. Instead, it encrypts MATLAB code so that it cannot be viewed or modified. It also applies a wrapper around the code. Including the MCR in the compilation makes a GUI-less, standalone application which you may distribute royalty-free. You can share your compiled program with colleagues who have neither MATLAB licenses nor the MCR. Your compiled program will take its required licenses within itself. So, you do not need to install any license file on your colleague's machine.

The following table summarizes MATLAB license usage:

Method MATLAB PCT DCS mcc
Run within MATLAB 1 0 0 1
Run without MATLAB 0 0 0 1

Prepare either a MATLAB script M-file or a MATLAB function M-file. The method described below works for both.

The MATLAB script M-file includes the MATLAB statement quit to ensure that the compiled program terminates. Use an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display name of compute node which ran this job.
[c name] = system('hostname');
fprintf('\n\nhostname:%s\n', name)

% Display three random numbers.
A = rand(1,3);
fprintf('%f %f %f\n', A);

quit;

The MATLAB function M-file has the usual function and end statements. Use an appropriate filename, here named myfunction.m:

% FILENAME:  myfunction.m

function result = myfunction ()

    % Return name of compute node which ran this job.
    [c name] = system('hostname');
    result = sprintf('hostname:%s', name);

    % Return three random numbers.
    A = rand(1,3);
    r = sprintf('%f %f %f', A);
    result=strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_myscript.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC and verify the versions loaded. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Compile the MATLAB script M-file:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ mcc -m mywrapper.m myscript.m

A few new files appear after the compilation:

mccExcludedFiles.log
myscript
myscript.prj
myscript_main.c
myscript_mcc_component_data.c
readme.txt
run_myscript.sh

The name of the stand-alone executable file is myscript. The name of the shell script to run this executable file is run_myscript.sh.

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myscript $*
fi
exit

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
378428.rossmann-ad kes      workq    myjob.sub   18964   1   1    --  00:01 R 00:00

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a637.rcac.purdue.edu
run_myscript.sh
rossmann-a637.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa
64:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
/server:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.


hostname:rossmann-a637.rcac.purdue.edu

0.814724 0.905792 0.126987

Output shows the name of the compute node that ran the job submission file myjob.sub, the name of the compute node that ran the compiler-generated script run_myscript.sh, and the name of the compute node that ran the serial job: a637 in all three cases. Output also shows the three random numbers.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply this method of job submission to a MATLAB function M-file, prepare a wrapper function which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

# FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

For more information about the MATLAB Compiler:

MATLAB Executable (MEX-file: Serial, MPI, OpenMP, Hybrid, CUDA Code)

MEX stands for MATLAB Executable. A MEX-file offers an interface which allows MATLAB code to call functions written in C, C++, or Fortran as though these external functions were built-in MATLAB functions. MATLAB also offers external interface functions that facilitate the transfer of data between MEX-files and MATLAB. A MEX-file usually starts by transferring data from MATLAB to the MEX-file; then it processes the data with the user-written code; and finally, it transfers the results back to MATLAB. This feature involves compiling then dynamically linking the MEX-file to the MATLAB program. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while. Areas of application include legacy code written in C, C++, or Fortran.

This section illustrates how to use the PBS qsub command to submit a small MATLAB job with a MEX-file to a PBS queue.

The first MEX example calls a C function which employs serial code to add two matrices. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license.

The second MEX example calls a C function which employs MPI to distribute the work of a message-passing program among several compute nodes. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license. This example avoids using a PCT license.

The third MEX example calls a C function which employs OpenMP to distribute the work of a shared-memory program (parallel for loop) among several threads. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license. This example avoids using a PCT license.

The fourth MEX example calls a C function which employs both MPI and OpenMP to distribute the work of a hybrid program across compute nodes and across processor cores within each compute node. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license. This example avoids using a PCT license.

For the first example, prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Matrix (component-wise) addition. */
    for (i = 0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. Use an appropriate filename, here named matrixSum.c:

/***********************************************************
 * FILENAME:  matrixSum.c
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum (inMatrix, inMatrix, outMatrix, size)
 *
 * This is a MEX-file for MATLAB.
 *
 **********************************************************/

#include "mex.h"

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, mwSize n) {
    mwSize i;

    /* Component-wise addition. */
    for (i = 0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
void mexFunction (int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[]) {
    double *inMatrix_a;               /* mxn input matrix  */
    double *inMatrix_b;               /* mxn input matrix  */
    mwSize nrows_a,ncols_a;           /* size of matrix a  */
    mwSize nrows_b,ncols_b;           /* size of matrix b  */
    double *outMatrix_c;              /* mxn output matrix */

    /* Check for proper number of arguments. */
    if(nrhs!=2) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
    }
    if(nlhs!=1) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
    }

    /* Get dimensions of the first input matrix. */
    nrows_a = mxGetM(prhs[0]);
    ncols_a = mxGetN(prhs[0]);
    /* Get dimensions of the second input matrix. */
    nrows_b = mxGetM(prhs[1]);
    ncols_b = mxGetN(prhs[1]);

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
    }

    /* Make a pointer to the real data in the first input matrix. */
    inMatrix_a = mxGetPr(prhs[0]);
    /* Make a pointer to the real data in the second input matrix. */
    inMatrix_b = mxGetPr(prhs[1]);

    /* Make the output matrix. */
    plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);

    /* Make a pointer to the real data in the output matrix. */
    outMatrix_c = mxGetPr(plhs[0]);

    /* Call the computational routine. */
    matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
}

Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r myscript

To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). Compile matrixSum.c into a MATLAB-callable MEX-file:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ mex matrixSum.c

The name of the MATLAB-callable MEX-file is matrixSum.mexa64. If you see the following warning, ignore it:

Warning: You are using gcc version "4.6.2".  The version
         currently supported with MEX is "4.3.4".
         For a list of currently supported compilers see:
         http://www.mathworks.com/support/compilers/current_release/

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a148.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

myscript.m:  hostname:rossmann-a148.rcac.purdue.edu

A =

     1     1     1
     1     1     1


B =

     2     2     2
     2     2     2


C =

     3     3     3
     3     3     3

Output shows the name of the compute node (a148) which processed this serial job. Also, this job shared the compute node with other jobs.

Any output written to standard error will appear in myjob.sub.emyjobid.

Rerun this serial job so that it has exclusive access to its compute node:

qsub -l nodes=1:ppn=24,walltime=00:01:00 myjob.sub

For the second example, prepare a MEX file with a function containing MPI function calls. Use an appropriate filename, here named mex_mpi.c:

/* FILENAME:  mex_mpi.c */

#include "mex.h"
#include <stdio.h>
#include <mpi.h>

void f () {

    /* MPI Parameters                                                  */
    int rank, size, len;
    char name[MPI_MAX_PROCESSOR_NAME];

    /* All ranks initiate the message-passing environment.             */
    /* Each rank obtains information about itself and its environment. */
    MPI_Init(/*&argc, &argv*/ 0,0);             /* start MPI           */
    MPI_Comm_size(MPI_COMM_WORLD, &size);       /* get number of ranks */
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);       /* get rank            */
    MPI_Get_processor_name(name, &len);         /* get run-host name   */

    printf("Runhost:%s   Rank:%d of %d ranks   hello, world\n", name,rank,size);

    MPI_Finalize();                             /* terminate MPI       */
    return;
}

void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
    /* Check for proper number of arguments.                           */
    if(nrhs!=0) {
        mexErrMsgTxt("Zero input required.");
    } else if(nlhs>0) {
        mexErrMsgTxt("Too many output arguments.");
    }

    /* Display the name of the compute node.                           */
    f();
}

Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
% Display the names of compute nodes which run the MPI ranks.
mex_mpi();

quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
module load mvapich2/1.7_gcc-4.4.5
cd $PBS_O_WORKDIR
unset DISPLAY

# -n:                4 MPI ranks
# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
mpiexec -n 4 matlab -nodisplay -singleCompThread -r myscript

To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). Load GCC Version 4.4.5 with a recent version of MPI-2. Compile the C program into a MATLAB-callable MEX-file:

$ module load matlab/R2011b
$ module load mvapich2/1.7_gcc-4.4.5
$ mex mex_mpi.c CC="mpicc"

The name of the MATLAB-callable MEX-file is mex_mpi.mexa64.

Submit the job while requesting four compute nodes, each with one processor core and one MPI rank:

$ qsub -l nodes=4:ppn=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a148.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011
 

  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 
 
  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 

  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 

  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.

myscript.m:  hostname:rossmann-a148.rcac.purdue.edu
myscript.m:  hostname:rossmann-a158.rcac.purdue.edu
myscript.m:  hostname:rossmann-a159.rcac.purdue.edu
myscript.m:  hostname:rossmann-a160.rcac.purdue.edu
Runhost:rossmann-a148.rcac.purdue.edu   Rank:0 of 4 ranks   hello, world
Runhost:rossmann-a159.rcac.purdue.edu   Rank:2 of 4 ranks   hello, world
Runhost:rossmann-a158.rcac.purdue.edu   Rank:1 of 4 ranks   hello, world
Runhost:rossmann-a160.rcac.purdue.edu   Rank:3 of 4 ranks   hello, world

Output shows the names of the compute nodes (a148,a158,a159,a160) which processed this MPI job. The MPI ranks resided on different compute nodes. Also, this job shared its compute nodes with other jobs.

Any output written to standard error will appear in myjob.sub.emyjobid.

Rerun this MPI job so that each rank has exclusive access to its compute node:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
% Display the names of compute nodes which run the MPI ranks.
mex_mpi();

quit;

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
module load mvapich2/1.7_gcc-4.4.5
cd $PBS_O_WORKDIR
unset DISPLAY

uniq <$PBS_NODEFILE >nodefile

# -n:                4 MPI ranks
# -machinefile:      alternate source for compute node names
# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator 
mpiexec -n 4 -machinefile nodefile matlab -nodisplay -singleCompThread -r myscript

qsub -l nodes=4:ppn=24,walltime=00:01:00 myjob.sub

myjob.sub
rossmann-a002.rcac.purdue.edu
myscript.m:  hostname:rossmann-a002.rcac.purdue.edu
myscript.m:  hostname:rossmann-a003.rcac.purdue.edu
myscript.m:  hostname:rossmann-a005.rcac.purdue.edu
myscript.m:  hostname:rossmann-a004.rcac.purdue.edu
Runhost:rossmann-a002.rcac.purdue.edu   Rank:0 of 4 ranks   hello, world
Runhost:rossmann-a004.rcac.purdue.edu   Rank:2 of 4 ranks   hello, world
Runhost:rossmann-a003.rcac.purdue.edu   Rank:1 of 4 ranks   hello, world
Runhost:rossmann-a005.rcac.purdue.edu   Rank:3 of 4 ranks   hello, world

Output shows that each MPI rank resides on a different compute node. Each rank has exclusive access to its compute node.

For the third example, prepare a MEX file with a function containing OpenMP directives and function calls. Use an appropriate filename, here named mex_openmp.c:

/* FILENAME:  mex_openmp.c */

#include "mex.h"
#include <stdio.h>
#include <omp.h>

void f () {

    /* SERIAL REGION  (master thread)                                             */
    /* Parameters of the Application                                              */
    int len=30;
    char name[30];                      /* run-host name                          */
    int i;                              /* loop control variable                  */

    /* OpenMP Parameters                                                          */
    int id, nthreads;

    /* Master thread obtains information about itself and its environment.        */
    nthreads = omp_get_num_threads();   /* get number of threads                  */
    id = omp_get_thread_num();          /* get thread ID                          */
    gethostname(name,len);              /* get run-host name                      */
    printf("SERIAL REGION:   Runhost:%s   Thread:%d of %d thread    hello, world\n", name,id,nthreads);

    /* Open parallel region.                                                      */
    #pragma omp parallel shared(nthreads)
    {nthreads = omp_get_num_threads();   /* get number of threads                 */
    }  /* store value in shared nthreads of serial region                         */

/*  printf("nthreads = %d\n", nthreads);  */

    /* PARALLEL REGION                                                            */
    #pragma omp parallel for private(name,id) firstprivate(nthreads)
    for (i=0; i<2*nthreads; i++) {
        nthreads = omp_get_num_threads();   /* get number of threads              */
        id = omp_get_thread_num();          /* get thread ID                      */
        gethostname(name,len);              /* get run-host name                  */
        printf("PARALLEL LOOP:   Runhost:%s   Thread:%d of %d threads   Iteration:%2d   hello, world\n", name,id,nthreads,i);
    }   /*  lexical extent of loop-level parallelism                              */

    /* SERIAL REGION  (master thread)                                             */
    nthreads = omp_get_num_threads();   /* get number of threads                  */
    printf("SERIAL REGION:   Runhost:%s   Thread:%d of %d thread    hello, world\n", name,id,nthreads);
    return;
}

void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
    /* Check for proper number of arguments. */
    if(nrhs!=0) {
        mexErrMsgTxt("Zero input required.");
    } else if(nlhs>0) {
        mexErrMsgTxt("Too many output arguments.");
    }

    /* Display the name of the compute node.               */
    /* Display the iterations which each thread processes. */
    f();
}

Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
% Display the name of the compute node which runs the OpenMP threads.
% Display the iterations which each thread processes.
mex_openmp();

quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -singleCompThread -r myscript

To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). It implements OpenMP. Run a MATLAB client, compile mex_openmp.c into a MATLAB-callable MEX-file, and quit the client:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ matlab -nodisplay -singleCompThread
>> mex mex_openmp.c CFLAGS="\$CFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
>> quit;
$

The name of the MATLAB-callable MEX-file is mex_openmp.mexa64. If you see the following warning, ignore it:

Warning: You are using gcc version "4.6.2".  The version
         currently supported with MEX is "4.3.4".
         For a list of currently supported compilers see:
         http://www.mathworks.com/support/compilers/current_release/

Submit the job while requesting four processor cores on one compute node:

$ qsub -l nodes=1:ppn=4,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a001.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

myscript.m:  hostname:rossmann-a001.rcac.purdue.edu
SERIAL REGION:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:0 of 4 threads   Iteration: 0   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:0 of 4 threads   Iteration: 1   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:2 of 4 threads   Iteration: 4   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:3 of 4 threads   Iteration: 2   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:2 of 4 threads   Iteration: 5   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:1 of 4 threads   Iteration: 3   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:3 of 4 threads   Iteration: 6   hello, world
PARALLEL LOOP:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:1 of 4 threads   Iteration: 7   hello, world
SERIAL REGION:   Runhost:rossmann-a001.rcac.purdue.edu   Thread:0 of 1 thread    hello, world

Output shows the name of the compute node (a001) which processed this OpenMP job. Four threads processed the iterations of the parallel loop. Also, this job shared the compute node with other jobs.

Any output written to standard error will appear in myjob.sub.emyjobid.

Rerun this OpenMP job so that its four threads have exclusive access to their compute node:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

export OMP_NUM_THREADS=4 

# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator 
matlab -nodisplay -singleCompThread -r myscript

qsub -l nodes=1:ppn=24,walltime=00:01:00 myjob.sub

For the fourth example, prepare a MEX file with a function containing MPI function calls and OpenMP directives and function calls. Use an appropriate filename, here named mex_hybrid.c:

/* FILENAME:  mex_hybrid.c */

#include "mex.h"
#include <stdio.h>
#include <mpi.h>
#include <omp.h>

void f () {

    /* Serial Region  (master thread of an MPI rank) */
    /* MPI Parameters                                */
    int rank, size, len;
    char name[MPI_MAX_PROCESSOR_NAME];

    /* OpenMP Parameters */
    int id, nthreads;

    /* All ranks initiate the message-passing environment.             */
    /* Each rank obtains information about itself and its environment. */
    MPI_Init(0,0);                          /* start MPI           */
    MPI_Comm_size(MPI_COMM_WORLD, &size);   /* get number of ranks */
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);   /* get rank            */
    MPI_Get_processor_name(name, &len);     /* get run-host name   */

    /* Master thread obtains information about itself and its environment. */
    nthreads = omp_get_num_threads();       /* get number of threads */
    id = omp_get_thread_num();              /* get thread            */
    printf("SERIAL REGION:     Runhost:%s   Rank:%d of %d ranks, Thread:%d of %d thread    hello, world\n", name,rank,size,id,nthreads);

    /* Open parallel region.                                                 */
    /* Each thread obtains information about itself and its environment. */
    #pragma omp parallel private(name,id,nthreads)
    {MPI_Comm_size(MPI_COMM_WORLD, &size);  /* get number of ranks   */
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);  /* get rank              */
     MPI_Get_processor_name(name, &len);    /* get run-host name     */
     nthreads = omp_get_num_threads();      /* get number of threads */
     id = omp_get_thread_num();             /* get thread            */
     printf("PARALLEL REGION:   Runhost:%s   Rank:%d of %d ranks, Thread:%d of %d threads   hello, world\n", name,rank,size,id,nthreads);
    }
    /* Close parallel region. */

    /* Serial Region  (master thread) */
    printf("SERIAL REGION:     Runhost:%s   Rank:%d of %d ranks, Thread:%d of %d thread    hello, world\n", name,rank,size,id,nthreads);

    /* Exit master thread.                                         */
    MPI_Finalize();                         /* terminate MPI       */
    return;
}

void mexFunction(int nlhs,mxArray* plhs[], int nrhs, const mxArray* prhs[])
{
    /* Check for proper number of arguments. */
    if(nrhs!=0) {
        mexErrMsgTxt("Zero input required.");
    } else if(nlhs>0) {
        mexErrMsgTxt("Too many output arguments.");
    }

    /* Display the names of the compute nodes.             */
    /* Display the iterations which each thread processes. */
    f();
}

Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
% Display the names of compute nodes which run the MPI ranks.
% Display the iterations which each thread processes.
mex_hybrid();

quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
module load mvapich2/1.7_gcc-4.4.5
cd $PBS_O_WORKDIR
unset DISPLAY

# -n:                2 MPI ranks
# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator
mpiexec -n 2 matlab -nodisplay -singleCompThread -r myscript

To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). It implements OpenMP. Load GCC Version 4.4.5 with a recent version of MPI-2. Run a MATLAB client, compile mex_hybrid.c

$ module load matlab/R2011b
$ module load mvapich2/1.7_gcc-4.4.5
$ matlab -nodisplay -singleCompThread
>> mex mex_hybrid.c CC="mpicc" CFLAGS="\$CFLAGS -fopenmp" LDFLAGS="\$LDFLAGS -fopenmp"
>> quit;
$

The name of the MATLAB-callable MEX-file is mex_hybrid.mexa64.

Submit the job while requesting two compute nodes, each with one MPI rank and four OpenMP threads:

$ qsub -l nodes=2:ppn=4,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a080.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011



To get started, type one of these: helpwin, helpdesk, or demo.
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
For product information, visit www.mathworks.com.


myscript.m:  hostname:rossmann-a080.rcac.purdue.edu
myscript.m:  hostname:rossmann-a080.rcac.purdue.edu

SERIAL REGION:     Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:2 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:3 of 4 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a080.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:2 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:3 of 4 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a080.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

Output shows the name of the compute node (a080) which processed this hybrid job. Both MPI ranks resided on the same compute node. This compute node has enough processor cores to run both MPI ranks. Also, this job shared the compute node with other jobs.

Any output written to standard error will appear in myjob.sub.emyjobid.

Rerun this hybrid job so that each rank with its four threads has exclusive access to its compute node:

% FILENAME:  myscript.m

% Display the name of the compute node which runs this script.
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('myscript.m:  hostname:%s\n', name)

% Call the separately compiled and dynamically linked MEX-file.
% Display the names of compute nodes which run the MPI ranks.
% Display the iterations which each thread processes.
mex_hybrid();

quit;

!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab /R2011b
module load mvapich2/1.7_gcc-4.4.5
cd $PBS_O_WORKDIR
unset DISPLAY

uniq <$PBS_NODEFILE >nodefile
export OMP_NUM_THREADS=4

# -n:                2 MPI ranks
# -machinefile:      alternate source for compute node names
# -nodisplay:        run MATLAB in text mode; X11 server not needed
# -singleCompThread: turn off implicit parallelism
# -r:                read MATLAB program; use MATLAB JIT Accelerator 
mpiexec -n 2 -machinefile nodefile matlab -nodisplay -singleCompThread -r myscript

qsub -l nodes=2:ppn=24,walltime=00:01:00 myjob.sub

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a193.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011



To get started, type one of these: helpwin, helpdesk, or demo.
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
For product information, visit www.mathworks.com.

myscript.m:  hostname:rossmann-a193.rcac.purdue.edu
myscript.m:  hostname:rossmann-a194.rcac.purdue.edu

SERIAL REGION:     Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:2 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:3 of 4 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a193.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world

SERIAL REGION:     Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world
PARALLEL REGION:   Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:1 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:2 of 4 threads   hello, world
PARALLEL REGION:   Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:3 of 4 threads   hello, world
SERIAL REGION:     Runhost:rossmann-a194.rcac.purdue.edu   Rank:1 of 2 ranks, Thread:0 of 1 thread    hello, world

Output shows the names of the compute nodes (a193,a194) which processed this hybrid job. Each MPI rank resided on a different compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB MEX-file:

To see online documentation about MEX-files, enter at the MATLAB command-line prompt:

>> web([docroot '/techdoc/matlab_external/f29322.html#bsabtn2-1'])

MATLAB Standalone Program

A stand-alone MATLAB program is a C, C++, or Fortran program which calls user-written M-files and the same libraries which MATLAB uses. A stand-alone program has access to MATLAB objects, such as the array and matrix classes, as well as all the MATLAB algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select MATLAB functions, a stand-alone MATLAB program may be a good option. This offers the possibility for substantially improved performance over MATLAB source code, especially for statements like for and while while still allowing use of specialized MATLAB functions where useful.

This section illustrates how to submit a small, stand-alone, MATLAB program to a PBS queue. This C example calls a compiled MATLAB script which computes the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

Prepare a MATLAB function which returns the inverse of a matrix. Use an appropriate filename, here named myinverse.m:

% FILENAME:  myinverse.m

function Y = myinverse (X)

    % Display name of compute node which runs this function.
    [c name] = system('hostname');
    fprintf('\n\nhostname:%s\n', name)

    % Invert a matrix.
    Y = inv(X);

end

Prepare a second MATLAB function which displays a matrix. Use an appropriate filename, here named myprintmatrix.m:

% FILENAME:  myprintmatrix.m

function myprintmatrix(A)
         disp(A)
end

Prepare a C source file with a main function and the necessary external function interface and give it an appropriate filename, here named myprogram.c. Note that when you invoke a MATLAB function from C, the MATLAB function name appears "mangled". The C program invokes the MATLAB function myinverse using the name mlfMyinverse and the MATLAB function myprintmatrix using the name mlfMyprintmatrix. You must modify all MATLAB function names in this manner when you call them from outside MATLAB:

/* FILENAME:  myprogram.c

Inverse of:

      A                B
   -------        ------------
   1  2  1         1 -3/2  1/2
   1  1  1   -->   1  -1   0
   3 -1  1        -2  7/2 -1/2



    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

*/


#include <stdio.h>
#include <math.h>
#include "libmylib.h"     /* compiler-generated header file */

int main (const int argc, char ** argv) {

    mxArray *A;   /* matrix containing                      */
    mxArray *B;   /* matrix containing result               */

    int Nrow=3, Ncol=3;
    double a[] = {1,2,1,1,1,1,3,-1,1};  /* row-major order  */
    double b[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
    double *ptr;

    printf("Enter myprogram.c\n");

    libmylibInitialize();     /* call mylib initialization  */

    /* Make an uninitialized Nrow x Ncol MATLAB matrix.    */
    A = mxCreateDoubleMatrix(Nrow, Ncol, mxREAL);

    /* Initialize the MATLAB matrix.                        */
    ptr = (double *)mxGetPr(A);
    memcpy(ptr,b,Nrow*Ncol*sizeof(double));

    /* Call mlfMyinverse, the compiled version of myinverse.m. */
    mlfMyinverse(1,&B,A);

    /* Print the results. */
    mlfMyprintmatrix(B);

    /* Free the matrices allocated during this computation. */
    mxDestroyArray(A);
    mxDestroyArray(B);

    libmylibTerminate();     /* call mylib initialization   */

    printf("Exit myprogram.c\n");
    return 0;
}

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

./myprogram

To access the MATLAB Compiler mcc and mbuild, load a MATLAB module. The MATLAB Compiler, mcc, depends on shared libraries from GCC Version 4.3.x. This version is not available on Rossmann, but GCC Version 4.6.2 is compatible. Compile the user-written, MATLAB functions into a dynamically loaded, shared library. Compile the C program:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ mcc -W lib:libmylib -T link:lib myinverse.m myprintmatrix.m
$ mbuild myprogram.c -L. -lmylib -I.

Several new files appear after the compilation:

libmylib.c
libmylib.exports
libmylib.h
libmylib.so
mccExcludedFiles.log
myinverse
myprintmatrix
myprogram
readme.txt

The name of the compiled, stand-alone MATLAB program is myprogram. The name of the dynamically linked library of user-written MATLAB functions is mylib.

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a145.rcac.purdue.edu
Enter myprogram.c
Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: Unable to load Java Runtime Environment: libjvm.so: cannot open shared object file: No such file or directory
Warning: Disabling Java support
Hello, Thomas


hostname:rossmann-a145.rcac.purdue.edu

    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

Exit myprogram.c

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB stand-alone program:

MATLAB Engine Program

The MATLAB Engine allows using MATLAB as a computation engine. A MATLAB Engine program is a standalone C, C++, or Fortran program which calls functions of the Engine Library allowing you to start and end a MATLAB process, send data to and from MATLAB, and send commands to be processed in MATLAB. When employed in this manner, MATLAB is a powerful and programmable mathematical subroutine library.

This section illustrates how to submit a small, stand-alone, MATLAB Engine program to a PBS queue. This C program calls functions of the Engine Library to compute the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

Prepare a C program which computes the inverse of a matrix. Use an appropriate filename, here named myprogram.c:

/* FILENAME:  myprogram.c

A simple program to illustrate how to call MATLAB Engine functions
from a C program.  

Inverse of:

      A                B
   -------        ------------
   1  2  1         1 -3/2  1/2
   1  1  1   -->   1  -1   0
   3 -1  1        -2  7/2 -1/2

*/


#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "engine.h"
#define  BUFSIZE 256


int main ()
{
    Engine *ep;
    mxArray *A = NULL;
    mxArray *B = NULL;
    int Ncol=3, Nrow=3, col, row, ndx;
    double a[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
    double b[9] = {9,9,9,9,9,9,9,9,9};
    char buffer[BUFSIZE+1];

    printf("Enter myprogram.c\n");

    /* Call engOpen with a NULL string. This starts a MATLAB process */
    /* on the current host using the command "matlab".               */
    if (!(ep = engOpen(""))) {
        fprintf(stderr, "\nCan't start MATLAB engine\n");
        return EXIT_FAILURE;
    }

    buffer[BUFSIZE] = '\0';
    engOutputBuffer(ep, buffer, BUFSIZE);

    /* Make a variable for the data. */
    A = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
    B = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
    memcpy((void *)mxGetPr(A), (void *)a, sizeof(a));

    /* Place the variable A into the MATLAB workspace. */
    /* Place the variable B into the MATLAB workspace. */
    engPutVariable(ep, "A", A);
    engPutVariable(ep, "B", B);

    /* Evaluate and display the inverse. */
    engEvalString(ep, "B = inv(A)");
    printf("%s", buffer);

    /* Get variable B from the MATLAB workspace.       */
    /* Copy inverted matrix to a C array named "b".    */
    B = engGetVariable(ep, "B");
    memcpy((void *)b, (void *)mxGetPr(B), sizeof(b));
    ndx = 0;
    for (col=0;col<Ncol;++col) {
        for (row=0;row<Nrow;++row) {
            printf("  %5.1f", b[row*Nrow+col]);
            ++ndx;
        }
        printf("\n");
    }

    /* Free memory.                       */
    mxDestroyArray(A);
    mxDestroyArray(B);

    /* Close MATLAB engine.               */
    engClose(ep);

    /* Exit C program.                    */
    printf("Exit myprogram.c\n");
    return EXIT_SUCCESS;
}

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

./myprogram

Copy MATLAB file engopts.sh to the directory from which you intend to submit Engine jobs. Compile myprogram.c:

$ cp /apps/rhel5/MATLAB/R2011b/bin/engopts.sh .
$ mex -f engopts.sh myprogram.c

Submit the job:

$ qsub -l nodes=1,walltime=00:01:00 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a210.rcac.purdue.edu
Enter myprogram.c
>>
B =

    1.0000   -1.5000    0.5000
    1.0000   -1.0000         0
   -2.0000    3.5000   -0.5000

    1.0   -1.5    0.5
    1.0   -1.0    0.0
   -2.0    3.5   -0.5
Exit myprogram.c

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the MATLAB stand-alone program:

To see online documentation about Engine programs, enter at the MATLAB command-line prompt:

>> web([docroot '/techdoc/matlab_external/f29148.html#f26499'])

MATLAB Implicit Parallelism

MATLAB implements implicit parallelism which, in general, is the exploitation of parallelism that is inherent in many computations, such as matrix multiplication, linear algebra, and performing the same operation on a set of numbers. Implicit parallelism is a form of multithreading which uses hardware to execute efficiently multiple threads. This is different from the explicit parallelism of the Parallel Computing Toolbox. Multithreading aims to increase utilization of a single processor core by using thread-level as well as instruction-level parallelism. A language which provides implicit parallelism might allow the programmer to write the following:

set = [0 1 2 3 4 5 6 7];
result = cos(set);

The language can calculate independently the cosine of each member of the set. The language can spread the computation across available processor cores of a node. The advantage is that a programmer can focus on the problem at hand without worrying over the low-level details of parallelizing the code. Implicit parallelism allows simple code to achieve a substantial improvement in computational performance without additional directives in the programmer's source code.

MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. These functions run on the multicore processors of typical Linux clusters. Since these processor cores, or threads, share a common memory, many MATLAB functions contain multithreading potential. Vector operations, the particular application or algorithm, and the amount of computation (array size) contribute to the determination of whether a function runs serially or with multithreading.

If you have enabled multithreaded computation via File>Preferences>General>Multithreading in R2007a or if multithreading is on by default as it is in releases R2008a and later, you can observe the effect of implicit parallelism with the following example.

Prepare a MATLAB script M-file with thread-parallel enabled vector operations (".*" and ".^"). Use an appropriate filename, here named myscript.m:

% FILENAME:  myscript.m
% Implicit Parallelism


warning off all

% Before running, set core count of your compute cluster.
Ncorespernode = 16;        % 16 cores per node
Ntest = floor(log2(Ncorespernode))+1;

n = 5000000;               % matrix size:  5,000,000
x = zeros(n,1);
del = 2*pi/n;
vectorop   = zeros(Ntest,1);
speedup    = zeros(Ntest,1);
efficiency = zeros(Ntest,1);

% for-loop implementation
% will not trigger multithreading
tic                        % start timer
for i=1:n
    t = i*del;
    x(i) = (sin(t)*exp(-t))^3 + (t^4+5*t^-2)^0.3;
end
forloop = toc;             % stop timer

disp(forloop)



% vector implementation
% may trigger multithreading
% depending on the type of computation and work load
for i=1:Ntest
    m = 2^(i-1);
    maxNumCompThreads(m);    % set thread count 
    tic                      % start timer
    t = (1:n)*del;
    x = (sin(t).*exp(-t)).^3 + (t.^4+5*t.^-2).^0.3;
    vectorop(i) = toc;       % stop timer
    speedup(i) = vectorop(1)/vectorop(i);
    efficiency(i) = 100*speedup(i)/m;
end

fprintf('N Threads   Wall Time         Speedup   Efficiency\n')
fprintf('              (sec)            T1/TN    100*Speedup/N\n')
fprintf('---------   ---------         -------   -------------\n')
fprintf('for loop       %5.2f            N/A        N/A\n', forloop)
for i=1:Ntest
    fprintf('   %2d          %5.2f           %4.1f      %5.1f\n', 2^(i-1),vectorop(i),speedup(i),efficiency(i))
end

All warnings are off. Using the MATLAB function maxNumCompThreads() will initiate a warning about the deprecation of this function in a future release of MATLAB. For now, the function works as advertized.

Prepare a MATLAB script M-file which submits myscript.m with the MATLAB 'local' configuration and displays the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

job=batch('myscript','Configuration','local','CaptureDiary',true);

job.wait;
job.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator 
matlab -nodisplay -r mylclbatch

Submit the job as a single node and request exclusive access to a compute node:

$ qsub -l nodes=1:ppn=16,walltime=00:05:00 myjob.sub

View results in the file for all standard output, myjob.sub.omyjobid:

N Threads   Wall Time         Speedup   Efficiency
              (sec)            T1/TN    100*Speedup/N
---------   ---------         -------   -------------
for loop       16.67            N/A        N/A
    1           4.58            1.0      100.0
    2           2.37            1.9       96.8
    4           1.22            3.8       93.9
    8           0.65            7.1       88.3
   16           0.36           12.9       80.5

Results show the performance of a 16-core compute node. First, output shows the significant difference in performance between code with a for loop, which does not trigger implicit parallelism, and code with vector operations, which do trigger implicit parallelism. Secondly, output shows that as the number of threads increase, the wall time decreases. This is the effect of implicit parallelism. Speedup is the ratio of the base time for one thread and the time for N threads. Speedup is not perfect. When the number of threads doubles, the wall time is not quite half. Still, efficiency falls off slowly.

Implicit parallelism comes with disadvantages. It reduces the control that the programmer has over the parallel execution of the program, resulting sometimes in less-than-optimal parallel efficiency. This appears in the speedup column in the example above. When the number of threads increase from one to 16, speedup is noticeably less than 16. Also, implicit parallelism can make debugging difficult.

MATLAB is always greedy about what it can use. Its implicit parallelism discovers how many processor cores physically reside on a compute node and uses all of them. For example, Rossmann has 24 processor cores per compute node. This number is the return value of maxNumCompThreads() regardless how many processor cores you requested or how many processor cores other jobs are currently using. These three job submissions yield the same value for maxNumCompThreads():

$ qsub -l nodes=1,ppn=24,walltime=00:01:00 myjob.sub
$ qsub -l nodes=1:ppn=4,walltime=00:01:00 myjob.sub

When your job triggers implicit parallelism, it attempts to allocate its threads on all processor cores of the compute node on which the MATLAB client is running, including processor cores running other jobs. This competition can degrade the performance of all jobs running on the node. If an affected processor core participates in a larger, distributed-memory, parallel job involving many other nodes, then performance degradation can become much more widespread.

Cluster performance is partially the responsibility of MATLAB users. When you know that you are coding a serial job but are unsure whether you are using thread-parallel enabled operations, run MATLAB with implicit parallelism turned off. Beginning with the R2009b, you can turn multithreading off by starting MATLAB with -singleCompThread:

$ matlab -nodisplay -singleCompThread -r mymatlabprogram

When you are using implicit parallelism, request exclusive access to a compute node by requesting all cores which are physically available on a node of a compute cluster:

$ qsub -l nodes=1,ppn=24,walltime=00:01:00 myjob.sub

Parallel Computing Toolbox commands, such as spmd, preempt multithreading. Note that opening a MATLAB pool neither prevents multithreading nor changes the thread count in effect.

For more information about MATLAB's implicit parallelism:

MATLAB Parallel Computing Toolbox (parfor)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. PCT enables task and data parallelism on a multicore processor. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; version R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the fine-grained parallelism of a parallel for loop (parfor) in a pool job. Areas of application include for loops with independent iterations.

This section illustrates eight methods about submitting a small, parallel, MATLAB program with a parallel loop (parfor statement) as a batch, MATLAB pool job to a PBS queue. This MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each iteration of the parfor loop. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the iterations of the parallel loop.

The first method runs on a front end a MATLAB client which calls the MATLAB batch() function with a user-defined PBS configuration. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out seven licenses: one MATLAB license for the client running on the front end, one PCT license, and five DCS licenses. One DCS license runs the batch session (including the two serial regions in the MATLAB M-code) while the other four run the iterations of the parfor loop. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running function batch() and quitting MATLAB. The five DCS licenses remain active between running function batch() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The second method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The third method runs on a front end a MATLAB client which calls the MATLAB submit() function with the 'torque' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'torque' scheduler, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out seven licenses: one MATLAB license for the client running on the front end, one PCT license, and five DCS licenses. One DCS license runs the batch session (including the two serial regions in the MATLAB M-code) while the other four run the iterations of the parfor loop. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. The five DCS licenses remain active between running function submit() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The fourth method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB submit() function with the 'local' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'local' scheduler, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The fifth method uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS configuration which scatters the MATLAB workers onto different compute nodes. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the iterations of the parfor loop. This job is completely off the front end.

The sixth method uses the PBS qsub command to submit to a compute node a MATLAB client which interprets an M-file with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter; so, it requires and checks out one license: one MATLAB license for the client running on a compute node. The MATLAB license remains active between starting and quitting MATLAB. This job is completely off the front end.

The seventh method uses the MATLAB compiler mcc and the default parallel configuration set to a PBS configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. This method uses a PBS configuration during compilation. Since it uses a PBS configuration, this method, when executed, uses the MATLAB Distributed Computing Server; so, it requires and checks out four DCS licenses during the execution of the parfor statement. The serial portions of this job do not use a DCS license. This job is completely off the front end.

The eighth method uses the MATLAB Compiler mcc and the default parallel configuration set to the 'local' configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses the 'local' configuration, this method, when executed, uses no license. (Support for running compiled PCT code on the local configuration was added in R2011a; this feature removes the need for DCS licenses in some cases.) This job is completely off the front end.

The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. Each time you issue the compiler command, you reset the 30-minute timer. When running mcc at the MATLAB prompt, you hold the license as long as MATLAB remains open. To release the license, quit MATLAB. Quitting MATLAB is necessary to release the license, but not sufficient. When you quit MATLAB before the 30-minute timer runs out, then the license remains checked out for the remaining time. When running mcc at the Linux prompt, the 30-minute timer starts or restarts. The license manager tracks MATLAB licenses per user per cluster. If you run the MATLAB Compiler on two different clusters, you use two Compiler licenses. To minimize your license usage, run the Compiler on one cluster.

You can share your compiled program with colleagues who do not have MATLAB licenses. Your compiled program will take its required licenses within itself. So, you do not need to install any license file on your colleague's machine.

The following table summarizes MATLAB license usage:

Method Description MATLAB PCT DCS mcc Limitations
1 batch() with user-defined PBS configuration 1 1 Matlabpool + 1 0 number of MATLAB,PCT,DCS licenses purchased
2 batch() with 'local' configuration, qsub 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
3 submit() with 'torque' scheduler 1 1 MaximumNumberOfWorkers 0 number of MATLAB,PCT,DCS licenses purchased
4 submit() with 'local' scheduler, qsub 1 1 0 0 local scheduler with 8 (R2009a) and 12 (R2011a) workers
5 qsub with user-defined PBS configuration 1 1 pool size 0 number of MATLAB,PCT,DCS licenses purchased
6 qsub with 'local' configuration 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
7 Compiler with user-defined PBS configuration, qsub 0 0 pool size 1 number of DCS licenses purchased
8 Compiler with the 'local' configuration, qsub 0 0 0 1 local configuration with 8 (R2009a) and 12 (R2011a) workers

Prepare a MATLAB pool program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;        % get elapsed time in parallel loop
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
    result = strvcat(result,r);

end

Both M-files display the names of all compute nodes which run the job. The parfor statement does not set the values of variables numlabs or labindex, but frunction matlabpool() can return the pool size. The script M-file uses fprintf() to display the results. The function M-file returns a single value which contains a concatenation of the results.

The execution of a pool job starts with a worker (batch session) executing the statements of the first serial region up to the parfor block, when it pauses. A set of workers (the pool) executes the parfor block. When they finish, the batch session resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

The first method of job submission uses function batch() to offload from the MATLAB client running on the front end to a batch session running on a worker (compute node) the responsibilities of defining the parfor loop, which runs on another set of workers called the pool, and accumulating the results. The batch session and the pool cooperate on processing a single program. The batch session distributes the independent iterations of the loop to the workers of the pool. The workers of the pool process simultaneously their respective portions of the workload of the parallel loop so that the parallel loop might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB M-file (MATLAB function batch() accepts either a script M-file or a function M-file).

On the front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, view the current default parallel configuration; most likely, it is the 'local' configuration. Call MATLAB function batch() to make a four-lab pool on which to run the MATLAB code in the file myscript.m. In the call, replace the 'local' configuration by specifying your PBS configuration which you previously designed with the Configuration Manager (using the 'local' configuration at this point will run the parfor loop on the front end). This particular PBS configuration scatters the labs to different compute nodes to verify that a four-lab parfor loop actually uses five processor cores. Also, capture job output in the diary. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, view results by viewing the diary. When you do not want to keep the files of your job, use MATLAB function destroy() to erase them from your current working directory. After this job finishes, the default parallel configuration remains the 'local' configuration.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> disp(defaultParallelConfig);
local
>> pjob=batch('myscript','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> license('inuse')
distrib_computing_toolbox
matlab
>> disp(pjob.get('State'))
finished
>> license('inuse')
distrib_computing_toolbox
matlab
>> pjob.diary;
>> ls -l
>> pjob.destroy;
>> ls -l
>> disp(defaultParallelConfig);
local
>> quit;
$

The license name distrib_computing_toolbox refers to the Parallel Computing Toolbox.

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115204.rossmann-a myusername standby  Job1          --    5   5    --  00:01 Q   --

Job status shows that the MATLAB client submitted this job as five compute nodes (NDS), each with one processor core (TSK) and with a requested wall time of one minute. The call to function batch() specifies four labs to evaluate the iterations of the parallel loop (parfor statement). The fifth lab runs the batch session, myscript.m, defines the parfor loop, assigns loop iterations to the other four labs, and accumulates the results. This arrangement explains the presence of five DCS licenses.

View job output from the diary:

SERIAL REGION:  hostname:rossmann-a008.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a057.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a073.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a074.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a075.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a057.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a073.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a074.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a075.rcac.purdue.edu            4         1          8

SERIAL REGION:  hostname:rossmann-a008.rcac.purdue.edu
Elapsed time in parallel loop:   5.585185

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs in the pool which processed the parallel loop. Also, output shows the fifth lab which runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the lab running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

The MATLAB client "scattered" the five compute nodes (five MATLAB labs) among five different compute nodes. A processor core of one compute node (a008) processed the batch session, including the two serial regions, while four processor cores of four different compute nodes (a057,a073,a074,a075) processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. Output shows the iterations of the parfor loop in scrambled order since the labs process each iteration independently of the other iterations. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two consecutive iterations of the parfor loop. One compute node (a074) processed two nonconsecutive iterations: iterations 5 and 7. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Running this example with larger MATLAB pool sizes yields shorter runtimes:

Pool Size Time (seconds)
1 18.1
2 9.2
4 5.0
8 3.6

After calling the MATLAB function batch(), you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with the PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','Configuration','mypbsconfig');
>> pjob=findJob(sched,'State','finished');
>> pjob.diary;
>> pjob.destroy;
>> quit;
$

To apply the first method of job submission to a function M-file, use one of the following sequences:

>> pjob=batch('myfunction','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> disp(pjob.get('State'))
finished
>> pjob.diary

>> pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','mypbsconfig');                    
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

>> pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','mypbsconfig');
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

Function batch() offers an alternate mode of operation. If you exclude the property Configuration and its value from the list of arguments of function batch(), then batch() reads the default parallel configuration. Before calling batch(), set the default parallel configuration with your PBS configuration which you previously designed with the Configuration Manager. This change of the default parallel configuration is immediate and permanent; the PBS configuration remains the default parallel configuration during subsequent runs of MATLAB. MATLAB function batch() reads the default parallel configuration to discover your instructions about submission.

To scale up this method to handle a real application, use the Configuration Manager in the Parallel menu to increase the wall time to accommodate a longer running job. Enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of DCS licenses purchased.

The second method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first method since it uses function batch() and the same MATLAB M-file, either myscript.m or myfunction.m. This method differs from the first because the MATLAB client runs on a compute node rather than on the front end and the client uses the MATLAB 'local' configuration rather than a user-defined PBS configuration.

Prepare a MATLAB script M-file that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m, which specifies the 'local' configuration, and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with six processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=6,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m; one processor core runs the two serial regions of the MATLAB M-file; four processor cores run the iterations of the parallel for loop.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   6    --  00:01 R 00:00

Job status shows six processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


mylclbatch.m
rossmann-a000.rcac.purdue.edu
SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          8

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu
Elapsed time in parallel loop:   5.411486

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs in the pool which processed the parallel for loop. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the worker running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

Output shows that processor cores on one compute node (a000) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myscript.m, which includes the two serial regions, while four processor cores processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. Output shows the iterations of the parfor loop in scrambled order since the labs process each iteration independently of the other iterations. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two consecutive iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

pjob=batch('myfunction','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary

>> pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be two greater than the value of Matlabpool.

Specifying a MATLAB pool with 12 labs means a total of 13 workers. This exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);

$ qsub -l nodes=1:ppn=14,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using batch (line 172)
You requested a minimum of 13 workers but only 12 workers are allowed with the
local scheduler.

Error in mylclbatch (line 6)
pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);}

The third method of job submission uses function submit() to offload from the MATLAB client running on the front end to a batch session running on a worker (compute node) the responsibilities of defining the parfor loop, which runs on another set of workers called the pool, and accumulating the results. The batch session and the pool cooperate on processing a single program. The batch session distributes the independent iterations of the loop to the workers of the pool. The workers of the pool process simultaneously their respective portions of the workload of the parallel loop so that the parallel loop might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB function M-file (MATLAB function submit() accepts only a function M-file).

Prepare a MATLAB script M-file which finds the MATLAB 'torque' scheduler, which specifies scattering five processor cores to five different compute nodes and one minute of wall time for this short job and the number of PCT and DCS licenses, which defines a MATLAB pool job with five workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler', 'type', 'torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+5');');
pjob = createMatlabPoolJob(sched);
set(pjob,'MinimumNumberOfWorkers',5);
set(pjob,'MaximumNumberOfWorkers',5);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

On a front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, run mypbssubmit. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, get all output arguments into a cell array and display the results. When you do not want to keep the files of a job, use MATLAB function destroy() to erase them from your current working directory.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> mypbssubmit
FINISHED SUBMITTING
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> disp(pjob.get('State'))
finished
>> result = getAllOutputArguments(pjob)  

result = 

    [13x77 char]

>> result{1}
>> ls -l
>> job.destroy;
>> ls -l
>> quit
$

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115265.rossmann-a myusername standby  Job1          --    5   5    --  00:01 Q   --

Job status shows that the MATLAB client submitted this job as five compute nodes (NDS), each with one processor core (TSK), and the requested wall time of one minute. The call to function submit() specifies five labs as the minimum and maximum number of labs of the MATLAB pool. Four labs evaluate the iterations of the parfor loop. The fifth lab runs the batch session, myfunction.m, including the two serial regions, defines the parfor loop, assigns loop iterations to the other four labs, and accumulates the results. This arrangement explains the presence of five DCS licenses.

View job output:

SERIAL REGION:  hostname:rossmann-a001.rcac.purdue.edu
                        
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a013.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a013.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          3

SERIAL REGION:  hostname:rossmann-a001.rcac.purdue.edu                        
Elapsed time in parallel loop:   4.904323

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (5) defined the number of labs in the pool (4) which processed the parallel loop. Also, output shows the fifth lab which runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires N labs including the lab running the batch session, there must be at least N processor cores available on the cluster.

The MATLAB client scattered the five processor cores (five MATLAB labs) among five different compute nodes. A processor core of one compute node (a001) processed the batch session, including the two serial regions, while four processor cores of four different compute nodes (a009,a010,a012,a013) processed the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. Output shows the iterations of the parfor loop in scrambled order since the labs process each iteration independently of the other iterations. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two consecutive iterations of the parfor loop. Two compute nodes (a012 and a013) processed two nonconsecutive iterations. While this example evenly distributed the iterations among the four labs, you cannot assume that MATLAB will use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

After the "FINISHED SUBMITTING" message, you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','type','torque');
>> pjob=findJob(sched,'State','finished');
>> result=getAllOutputArguments(pjob);
>> result{1}
>> job.destroy;
>> quit
$

For practice, modify mypbssubmit.m to rerun this example as a single compute node with five processor cores:

set(sched,'SubmitArguments','-l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+5');

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115273.rossmann-a myusername standby  Job1          --    1   5    --  00:01 Q   --

The MATLAB client submitted this job as a single compute node (NDS) with five processor cores (TSK). The lab that runs the batch session and the four labs that run the parfor loop reside on the same compute node.

View job output:

SERIAL REGION:  hostname:rossmann-a015.rcac.purdue.edu
                        
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a015.rcac.purdue.edu            4         1          7

SERIAL REGION:  hostname:rossmann-a015.rcac.purdue.edu                        
Elapsed time in parallel loop:   4.926231

Output shows that processor cores of one compute node (a015) processed the entire job.

Function submit() offers an alternate mode of operation. It can read your PBS configuration made previously with the Configuration Manager. Here is an alternate version of mypbssubmit.m which relies on mypbsconfig for the several details of job submission:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','Configuration','mypbsconfig');
pjob = createMatlabPoolJob(sched);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

To scale up this method to handle a real application, increase the wall time in mypbssubmit.m to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of DCS licenses purchased. Finally, increase the value of MATLAB_Distrib_Comp_Server to match the value of MinimumNumberOfWorkers.

The fourth method of job submission uses the PBS qsub command to submit a pool job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar the third method since it uses function submit() and the same MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file). This method differs from the third because the MATLAB client runs on a compute node rather than on the front end and the client uses the MATLAB 'local' scheduler rather than the MATLAB 'torque' scheduler.

Prepare a MATLAB script M-file which finds the MATLAB 'local' scheduler, which defines a MATLAB pool job with five workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclsubmit.m

!echo "mylclsubmit.m"
!hostname

sched = findResource('scheduler', 'type', 'local');
set(sched,'ClusterMatlabRoot',matlabroot);
pjob = createMatlabPoolJob(sched);
set(pjob,'MinimumNumberOfWorkers',5)
set(pjob,'MaximumNumberOfWorkers',5)
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

pjob.wait;
result = getAllOutputArguments(pjob);
result{1}
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclsubmit

Submit the job as a single compute node with six processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=6,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclsubmit.m; one processor core runs the two serial regions of the batch session; four processor cores run the iterations of the parallel for loop.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115280.rossmann-ad myusername      standby  myjob.sub   19225   1   6    --  00:01 R 00:00

Job status shows six processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a012.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


mylclsubmit.m
rossmann-a012.rcac.purdue.edu
FINISHED SUBMITTING

ans =

SERIAL REGION:  hostname:rossmann-a012.rcac.purdue.edu

                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a012.rcac.purdue.edu            4         1          3

SERIAL REGION:  hostname:rossmann-a012.rcac.purdue.edu
Elapsed time in parallel loop:   6.203370

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (5) defined the number of labs in the pool (4) which processed the parallel loop. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires N workers including the worker running the batch session, there must be at least N processor cores available on the cluster.

The processor cores of one compute node (a012) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myfunction.m, which includes the two serial regions, while four processor cores processed the parallel for loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value of MaximumNumberOfWorkers.

Specifying 13 workers to achieve a MATLAB pool with 12 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

set(pjob,'MinimumNumberOfWorkers',13);
set(pjob,'MaximumNumberOfWorkers',13);

$ qsub -l nodes=1:ppn=14,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using distcomp.simpleparalleljob/pSetMinimumNumberOfWorkers (line 59)
MinimumNumberOfWorkers must be the same as or less than MaximumNumberOfWorkers
for a job

Error in mylclsubmit (line 9)
set(pjob,'MinimumNumberOfWorkers',13);

The fifth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first and third methods since it uses either a MATLAB script M-file or a MATLAB function M-file. Like the first and third methods, this method uses a user-defined PBS configuration. It differs from the other two methods because the script is slightly different. Two MATLAB statements specify a MATLAB pool since this method uses neither batch() nor submit() to make the MATLAB pool. Also, the MATLAB script M-file must quit at the end of the batch process. Another significant difference: This method uses one fewer DCS license than the first and third methods.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel loop
matlabpool close force;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    matlabpool close force;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('elapsed time:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript
# matlab -nodisplay -r myfunction

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license and four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00
332028.rossmann-ad myusername      standby  Job1          668   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu

Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a008.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a008.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a009.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a010.rcac.purdue.edu            4         1          8

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu
Elapsed time in parallel region:   3.382151

Output shows the name of the compute node (a000) that processed the job submission file myjob.sub and the two serial regions. The job submission "scattered" among four different compute nodes (a007,a008,a009,a010) the four compute nodes (four MATLAB labs) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased.

The sixth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is like the fifth method since it uses the same MATLAB M-files and the same job submission file myjob.sub. This method differs from the fifth since it uses the MATLAB 'local' configuration rather than a PBS configuration.

Run MATLAB to set the default parallel configuration to the MATLAB 'local' configuration:

$ matlab -nodisplay
>> defaultParallelConfig('local');
>> quit;
$

Submit the job as a single compute node with five processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115287.rossmann-ad myusername      standby  myjob.sub   24010   1   5    --  00:01 R 00:00

Job status shows five processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a007.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.


SERIAL REGION:  hostname:rossmann-a007.rcac.purdue.edu

Starting matlabpool using the 'local' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          7
PARALLEL LOOP:  rossmann-a007.rcac.purdue.edu            4         1          8
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann-a007.rcac.purdue.edu
Elapsed time in parallel loop:   4.783794

Output shows that processor cores on one compute node (a007) processed the entire job. Output also shows that a "matlabpool using the 'local' configuration" is connected to four MATLAB labs. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the for loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Running this example with larger MATLAB pool sizes yields shorter runtimes:

Pool Size Time (seconds)
1 17.2
2 9.0
4 4.8
8 3.8

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value in matlabpool open.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

$ qsub -l nodes=1:ppn=13,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myscript (line 6)
matlabpool open 13;

Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.

The seventh method of job submission uses the MATLAB Compiler mcc to compile a MATLAB function M-file with a PBS configuration and submits the compiled file to a PBS queue.

This method is similar to the third method since it uses a MATLAB function M-file. Like the third method, this method uses a user-defined PBS configuration. It differs from the third method because the script is slightly different. Two MATLAB statements specify a MATLAB pool since this method uses neither batch() nor submit() to make the MATLAB pool. Also, the MATLAB script M-file must quit at the end of the batch process. Another significant difference: This method uses one fewer DCS license than the third method.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements. Proceed with the MATLAB function M-file myfunction.m (when compiling a parfor statement, the parfor must be in a function, not in a script; this is a bug in MATLAB):

% FILENAME:  myscript.m

warning off all;

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
numlabs = matlabpool('size');
fprintf('                hostname                         numlabs  labindex  iteration\n')
fprintf('                -------------------------------  -------  --------  ---------\n')
tic;

% PARALLEL LOOP
parfor i = 1:8
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel loop
matlabpool close force;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    warning off all;

    % SERIAL REGION
    % Variable "result" is a "reduction" variable.
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    numlabs = matlabpool('size');
    r = sprintf('                hostname                         numlabs  labindex  iteration');
    result = strvcat(result,r);
    r = sprintf('                -------------------------------  -------  --------  ---------');
    result = strvcat(result,r);
    tic;

    % PARALLEL LOOP 
    parfor i = 1:8
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
        result = strvcat(result,r);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel loop
    matlabpool close force;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the loaded versions. Set the default parallel configuration to your PBS configuration and quit MATLAB. Compile both the MATLAB script M-file mywrapper.m and the MATLAB function M-file myfunction.m:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit
$ mcc -m mywrapper.m myfunction.m
$ mkdir test
$ cp mywrapper test
$ cp run_mywrapper.sh test
$ cp myjob.sub test
$ cd test

To obtain the name of the compute node which runs this compiler-generated script run_mywrapper.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_mywrapper.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with one processor core and request four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

This job runs on a compute node myjob.sub which in turn submits the parallel job. The first job must run at least as long as the job with the parallel loop since it collects the results of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1    --  00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1    --  00:05 R 00:00
115293.rossmann-ad myusername      standby  Job1        29390   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a021.rcac.purdue.edu
run_myfunction.sh
rossmann-a021.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.

SERIAL REGION:  hostname:rossmann-a021.rcac.purdue.edu

Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a021.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a022.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a023.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a024.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a021.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a022.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a023.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a024.rcac.purdue.edu            4         1          7
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

SERIAL REGION:  hostname:rossmann-a021.rcac.purdue.edu
Elapsed time in parallel loop:   5.125206

Output shows the name of the compute node (a021) that ran the job submission file myjob.sub and the compiler-generated script run_mywrapper.sh, the name of the compute node (a021) that ran the two serial regions, and the names of the four compute nodes (a021,a022,a023,a024) that ran the four scattered processor cores (four MATLAB labs) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased. Increase the value of MATLAB_Distrib_Comp_Server in the qsub command to match the new size of the pool.

The eighth method of job submission uses the MATLAB Compiler mcc to compile a MATLAB function M-file with the 'local' configuration and submits the compiled file to a PBS queue.

This method is like the seventh method since it uses the same MATLAB M-files mywrapper.m and myfunction.m and the same job submission file myjob.sub. This method differs from the seventh since it uses the 'local' configuration (R2011a added support for compiling PCT code on the 'local' configuration).

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_myfunction.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the versions loaded. Set the default parallel configuration to the 'local' configuration and compile the MATLAB function M-file:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('local');
>> quit
$ mcc -m mywrapper.m myfunction.m

To obtain the name of the compute node which runs this compiler-generated script run_mywrapper.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_mywrapper.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with four processor cores:

$ qsub -l nodes=1:ppn=4,walltime=00:05:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   4    --  00:05 R 00:00

Job status shows four processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a000.rcac.purdue.edu
run_myfunction.sh
rossmann-a000.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu

Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
                hostname                         numlabs  labindex  iteration
                -------------------------------  -------  --------  ---------
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          2
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          4
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          5
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          6
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          1
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          3
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          8
PARALLEL LOOP:  rossmann-a000.rcac.purdue.edu            4         1          7
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

SERIAL REGION:  hostname:rossmann-a000.rcac.purdue.edu
Elapsed time in parallel loop:   5.126201

Output shows that processor cores on one compute node (a000) processed the entire job. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. You cannot assume that since there are four processor cores and eight iterations, each processor core processes two iterations of the parfor loop. While this example evenly distributed the iterations among the four labs, MATLAB may not use this schedule in other situations. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop. What proves that several processor cores are processing in parallel the eight iterations of the parfor loop is that the processing time decreases as the number of processor cores increases.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must equal the value in matlabpool open.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

qsub -l nodes=1:ppn=13,walltime=00:01:00 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myfunction (line 10)



Error in mywrapper (line 3)



Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.
}
distcomp:matlabpool:RunValidation

For more information about MATLAB Parallel Computing Toolbox:

MATLAB Parallel Computing Toolbox (spmd)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. PCT enables task and data parallelism on a multicore processor. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; versions R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the coarse-grained parallelism of a parallel region (spmd) in a pool job. Areas of application include SPMD (single program, multiple data) problems.

This section illustrates eight methods about submitting a small, parallel, MATLAB program with a parallel region (spmd statement) as a batch, MATLAB pool job to a PBS queue. The MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each parallel region of the pool. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the parallel regions.

The first method runs on a front end a MATLAB client which calls the MATLAB batch() function with a user-defined PBS configuration. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out seven licenses: one MATLAB license for the client running on the front end, one PCT license, and five DCS licenses. One DCS license runs the batch session (including the two serial regions in the MATLAB M-code) while the other four run the spmd statement. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running function batch() and quitting MATLAB. The five DCS licenses remain active between running function batch() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The second method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The third method runs on a front end a MATLAB client which calls the MATLAB submit() function with the 'torque' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'torque' scheduler, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out seven licenses: one MATLAB license for the client running on the front end, one PCT license, and five DCS licenses. One DCS license runs the batch session (including the two serial regions in the MATLAB M-code) while the other four run the spmd statement. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. The five DCS licenses remain active between running function submit() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The fourth method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB submit() function with the 'local' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'local' scheduler, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The fifth method uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS configuration which scatters the MATLAB workers onto different compute nodes. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the spmd statement. This job is completely off the front end.

The sixth method uses the PBS qsub command to submit to a compute node a MATLAB client which interprets an M-file with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter; so, it requires and checks out one license: one MATLAB license for the client running on a compute node. The MATLAB license remains active between starting and quitting MATLAB. This job is completely off the front end.

The seventh method uses the MATLAB Compiler mcc and the default parallel configuration set to a PBS configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses a PBS configuration during compilation, this method, when executed, uses the MATLAB Distributed Computing Server; so, it requires and checks out four DCS licenses during the execution of the spmd statement. The serial portions of this job do not use a DCS license. This job is completely off the front end.

The eighth method uses the MATLAB Compiler mcc and the default parallel configuration set to the 'local' configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses the 'local' configuration, this method, when executed, uses no license. (Support for running compiled PCT code on the 'local' configuration was added in R2011a; this feature removes the need for DCS licenses in some cases.) This job is completely off the front end.

The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. Each time you issue the compiler command, you reset the 30-minute timer. When running mcc at the MATLAB prompt, you hold the license as long as MATLAB remains open. To release the license, quit MATLAB. Quitting MATLAB is necessary to release the license, but not sufficient. When you quit MATLAB before the 30-minute timer runs out, then the license remains checked out for the remaining time. When running mcc at the Linux prompt, the 30-minute timer starts or restarts. The license manager tracks MATLAB licenses per user per cluster. If you run the MATLAB Compiler on two different clusters, you use two Compiler licenses. To minimize your license usage, run the Compiler on one cluster.

You can share your compiled program with colleagues who do not have MATLAB licenses. Your compiled program will take its required licenses within itself. So, you do not need to install any license file on your colleague's machine.

The following table summarizes MATLAB license usage:

Method Description MATLAB PCT DCS mcc Limitations
1 batch() with user-defined PBS configuration 1 1 Matlabpool + 1 0 number of MATLAB,PCT,DCS licenses purchased
2 batch() with 'local' configuration, qsub 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
3 submit() with 'torque' scheduler 1 1 MaximumNumberOfWorkers 0 number of MATLAB,PCT,DCS licenses purchased
4 submit() with 'local' scheduler, qsub 1 1 0 0 local scheduler with 8 (R2009a) and 12 (R2011a) workers
5 qsub with user-defined PBS configuration 1 1 pool size 0 number of MATLAB,PCT,DCS licenses purchased
6 qsub with 'local' configuration 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
7 Compiler with user-defined PBS configuration, qsub 0 0 pool size 1 number of DCS licenses purchased
8 Compiler with 'local' configuration, qsub 0 0 0 1 local configuration with 8 (R2009a) and 12 (R2011a) workers

Prepare a MATLAB pool program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('                    hostname                         numlabs  labindex\n')
fprintf('                    -------------------------------  -------  --------\n')
tic;

% PARALLEL REGION
spmd
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel region
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "r" is a "composite object."
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    r = sprintf('                  hostname                         numlabs  labindex');
    result = strvcat(result,r);
    r = sprintf('                  -------------------------------  -------  --------');
    result = strvcat(result,r);
    tic;

    % PARALLEL REGION
    spmd
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL REGION:  %-31s  %7d  %8d', name,numlabs,labindex);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel region
    for ndx=1:length(r)          % concatenate composite object "r"
        result = strvcat(result,r{ndx});
    end
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel region:   %f', elapsed_time);
    result = strvcat(result,r);

end

Both M-files display the names of all compute nodes which run the job and the associated lab IDs. The script M-file uses fprintf() to display the results. The function M-file returns a single value which contains a concatenation of the results.

The execution of a pool job starts with a worker (batch session) executing the statements of the first serial region up to the spmd block, when it pauses. A set of workers (the pool) executes the spmd block. When they finish, the batch session resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

The first method of job submission uses function batch() to offload from the MATLAB client running on the front end to a batch session running on a worker (compute node) the responsibilities of supervising another set of workers called the pool and accumulating the results. The batch session and the pool cooperate on processing a single program. Each worker in the pool has a unique identifier and can determine its behavior from that ID. The workers of the pool process simultaneously their respective portions of the workload of the parallel region so that the parallel region might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB M-file (MATLAB function batch() accepts either a script M-file or a function M-file).

On the front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, view the current default parallel configuration; most likely, it is the 'local' configuration. Call MATLAB function batch() to make a four-lab pool on which to run the MATLAB code in the file myscript.m. In the call, replace the 'local' configuration by specifying your PBS configuration which you previously designed with the Configuration Manager (using the 'local' configuration at this point will run the spmd statement on the front end). This particular PBS configuration scatters the labs to different compute nodes to verify that a four-lab spmd statement actually uses five processor cores. Also, capture job output in the diary. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, view results by viewing the diary. When you do not want to keep the files of your job, use MATLAB function destroy() to erase them from your current working directory. After this job finishes, the default parallel configuration remains the 'local' configuration.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> disp(defaultParallelConfig);
local
>> pjob=batch('myscript','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> license('inuse')
distrib_computing_toolbox
matlab
>> disp(pjob.get('State'))
finished
>> license('inuse')
distrib_computing_toolbox
matlab
>> pjob.diary;
>> ls -l
>> pjob.destroy;
>> ls -l
>> disp(defaultParallelConfig);
local
>> quit;
$

The license name distrib_computing_toolbox refers to the Parallel Computing Toolbox.

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115204.rossmann-a myusername standby  Job1             5   5    --  00:01 Q   --

Job status shows that the MATLAB client submitted this job as five compute nodes (NDS), each with one processor core (TSK), and with a requested wall time of one minute. The call to function batch() specifies four labs to evaluate the parallel regions (spmd statement). The fifth lab runs the batch session, myscript.m, and accumulates the results. This arrangement explains the presence of five DCS licenses.

View job output from the diary:

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 2: 
  PARALLEL REGION:  rossmann-a637.rcac.purdue.edu            4         2
Lab 3: 
  PARALLEL REGION:  rossmann-a636.rcac.purdue.edu            4         3
Lab 4: 
  PARALLEL REGION:  rossmann-a635.rcac.purdue.edu            4         4
Lab 1: 
  PARALLEL REGION:  rossmann-a638.rcac.purdue.edu            4         1

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
Elapsed time in parallel region:   3.204323

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs in the pool which processed the parallel region. Also, output shows the fifth lab which runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the lab running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

The MATLAB client scattered the five processor cores (five MATLAB labs) among five different compute nodes. A processor core of one compute node (a639) processed the batch session, including the two serial regions, while four processor cores of four different compute nodes (a638,a637,a636,a635) processed the parallel regions. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four parallel regions, so there are four lab IDs. Output shows the four labs in scrambled order since the labs process each parallel region independently of the other parallel regions. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

After calling the MATLAB function batch(), you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with the PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','Configuration','mypbsconfig');
>> pjob=findJob(sched,'State','finished');
>> pjob.diary;
>> pjob.destroy;
>> quit;
$

To apply the first method of job submission to a function M-file, use one of the following sequences:

>> pjob=batch('myfunction','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> disp(pjob.get('State'))
finished
>> pjob.diary

>> pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','mypbsconfig');                    
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

>> pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','mypbsconfig');
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

Function batch() offers an alternate mode of operation. If you exclude the property Configuration and its value from the list of arguments of function batch(), then batch() reads the default parallel configuration. Before calling batch(), set the default parallel configuration with your PBS configuration which you previously designed with the Configuration Manager. This change of the default parallel configuration is immediate and permanent; the PBS configuration remains the default parallel configuration during subsequent runs of MATLAB. MATLAB function batch() reads the default parallel configuration to discover your instructions about submission.

To scale up this method to handle a real application, use the Configuration Manager in the Parallel menu to increase the wall time to accommodate a longer running job. Enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of DCS licenses purchased.

The second method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first method since it uses function batch() and the same MATLAB M-file, either myscript.m or myfunction.m. This method differs from the first because the MATLAB client runs on a compute node rather than on the front end and the client uses the MATLAB 'local' configuration rather than a user-defined PBS configuration.

Prepare a MATLAB script M-file that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m, which specifies the 'local' configuration and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with six processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=6,walltime=0:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m; one processor core runs the two serial regions of the MATLAB M-file; four processor cores run the four copies of the parallel region.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   6    --  00:01 R 00:00

Job status shows six processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclbatch.m
rossmann-a639.rcac.purdue.edu
SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

                  hostname                         numlabs  labindex
                  -------------------------------  -------  --------
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         2
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         3
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         4

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
Elapsed time in parallel region:   3.406318

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs (4) in the pool which processed the parallel region. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the lab running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

Output shows that processor cores on one compute node (a639) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myscript.m, which includes the two serial regions, while four processor cores processed the parallel regions. Variable numlabs shows the total number of labs (4). Variable labindex shows the ID of an individual lab. There are four labs, so there are four lab IDs. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid

To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

pjob=batch('myfunction','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary

pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
result{1}

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be two greater than the value of Matlabpool.

Specifying a MATLAB pool with 12 labs means a total of 13 workers. This exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);

$ qsub -l nodes=1:ppn=14,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using batch (line 172)
You requested a minimum of 13 workers but only 12 workers are allowed with the
local scheduler.

Error in mylclbatch (line 6)
pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);}

The third method of job submission uses function submit() to offload from the MATLAB client running on the front end to a batch session running on a worker (compute node) the responsibilities of supervising another set of workers called the pool and accumulating the results. The batch session and the pool cooperate on processing a single program. Each worker in the pool has a unique identifier and can determine its behavior from that ID. The workers of the pool process simultaneously their respective portions of the workload of the parallel region so that the parallel region might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB function M-file (MATLAB function submit() accepts only a function M-file).

Prepare a MATLAB script M-file which finds the MATLAB 'torque' scheduler, which specifies scattering five processor cores to five different compute nodes and one minute of wall time for this short job and the number of PCT and DCS licenses, which defines a MATLAB pool job with five workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler', 'type', 'torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+5');
pjob = createMatlabPoolJob(sched);
set(pjob,'MinimumNumberOfWorkers',5);
set(pjob,'MaximumNumberOfWorkers',5);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

On a front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, run mypbssubmit. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, get all output arguments into a cell array and display the results. When you do not want to keep the files of a job, use MATLAB function destroy() to erase them from your current working directory.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> mypbssubmit
FINISHED SUBMITTING
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> disp(pjob.get('State'))
finished
>> result = getAllOutputArguments(pjob)  

result = 

    [9x68 char]

>> result{1}                          
>> ls -l
>> job.destroy;
>> ls -l
>> quit
$

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115265.rossmann-a myusername standby  Job1          --    5   5    --  00:01 Q   --

Job status shows that the MATLAB client submitted this job as five compute nodes (NDS), each with one processor core (TSK), and the requested wall time of one minute. The call to function submit() specifies five labs as the minimum and maximum number of labs of the MATLAB pool. Four labs evaluate the four parallel regions. The fifth lab runs the batch session myfunction.m, including the two serial regions, and accumulates the results. This arrangement explains the presence of five DCS licenses.

View job output:

SERIAL REGION:  hostname:rossmann-a636.rcac.purdue.edu
               
                  hostname                         numlabs  labindex
                  -------------------------------  -------  --------
PARALLEL REGION:  rossmann-a634.rcac.purdue.edu            4         1
PARALLEL REGION:  rossmann-a633.rcac.purdue.edu            4         2
PARALLEL REGION:  rossmann-a632.rcac.purdue.edu            4         3
PARALLEL REGION:  rossmann-a631.rcac.purdue.edu            4         4

SERIAL REGION:  hostname:rossmann-a636.rcac.purdue.edu               
Elapsed time in parallel region:   2.878010

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (5) defined the number of labs in the pool (4) which processed the parallel region. Also, output shows the fifth lab which runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires N labs including the lab running the batch session, there must be at least N processor cores available on the cluster.

The MATLAB client scattered the five processor cores (five MATLAB labs) among five different compute nodes. A processor core of one compute node (a636) processed the batch session, including the two serial regions, while four processor cores of four different compute nodes (a634,a633,a632,a631) processed the parallel regions. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four parallel regions, so there are four lab IDs. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

After the "FINISHED SUBMITTING" message, you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','type','torque');
>> pjob=findJob(sched,'State','finished');
>> result=getAllOutputArguments(pjob);
>> result{1}
>> job.destroy;
>> quit
$

For practice, modify mypbssubmit.m to rerun this example as a single compute node with five processor cores:

set(sched,'SubmitArguments','-l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+5');

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115273.rossmann-a myusername standby  Job1          --    1   5    --  00:01 Q   --

The MATLAB client submitted this job as a single compute node (NDS) with five processor cores (TSK). The lab that runs the batch session and the four labs that run the spmd statement reside on the same compute node.

View job output:

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
               
                  hostname                         numlabs  labindex
                  -------------------------------  -------  --------
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         2
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         3
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         4

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu               
Elapsed time in parallel region:   2.964572

Output shows that processor cores of one compute node (a639) processed the entire job.

Function submit() offers an alternate mode of operation. It can read your PBS configuration made previously with the Configuration Manager. Here is an alternate version of mypbssubmit.m which relies on mypbsconfig for the several details of job submission:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','Configuration','mypbsconfig');
pjob = createMatlabPoolJob(sched);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

To scale up this method to handle a real application, increase the wall time in mypbssubmit.m to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of DCS licenses purchased. Finally, increase the value of MATLAB_Distrib_Comp_Server to match the value of MinimumNumberOfWorkers.

The fourth method of job submission uses the PBS qsub command to submit a pool job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar the third method since it uses function submit() and the same MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file). This method differs from the third because the MATLAB client runs on a compute node rather than on the front end and the client uses the MATLAB 'local' scheduler rather than the MATLAB 'torque' scheduler.

Prepare a MATLAB script M-file which finds the MATLAB 'local' scheduler, which defines a MATLAB pool job with five workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclsubmit.m

!echo "mylclsubmit.m"
!hostname

sched = findResource('scheduler', 'type', 'local');
set(sched,'ClusterMatlabRoot',matlabroot);
pjob = createMatlabPoolJob(sched);
set(pjob,'MinimumNumberOfWorkers',5)
set(pjob,'MaximumNumberOfWorkers',5)
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

pjob.wait;
result = getAllOutputArguments(pjob);
result{1}
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclsubmit

Submit the job as a single compute node with six processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=6,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclsubmit.m; one processor core runs the two serial regions of the batch session; four processor cores run the four copies of the parallel region.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   6    --  00:01 R 00:00

Job status shows six processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclsubmit.m
rossmann-a639.rcac.purdue.edu
>> FINISHED SUBMITTING

ans =

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

                  hostname                         numlabs  labindex
                  -------------------------------  -------  --------
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         2
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         3
PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         4

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
Elapsed time in parallel region:   3.587376

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (5) defined the number of labs in the pool (4) which processed the parallel region. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires N workers including the worker running the batch session, there must be at least N processor cores available on the cluster.

The processor cores of one compute node (a639) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myfunction.m, which includes the two serial regions, while four processor cores processed the parallel regions. Variable numlabs shows the total number of labs, which in this case is four. Variable labindex shows the ID of an individual lab. There are four labs, so there are four lab IDs. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value of MaximumNumberOfWorkers.

Specifying 13 workers to achieve a MATLAB pool with 12 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

set(pjob,'MinimumNumberOfWorkers',13);
set(pjob,'MaximumNumberOfWorkers',13);

$ qsub -l nodes=1:ppn=14,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using distcomp.simpleparalleljob/pSetMinimumNumberOfWorkers (line 59)
MinimumNumberOfWorkers must be the same as or less than MaximumNumberOfWorkers
for a job

Error in mylclsubmit (line 9)
set(pjob,'MinimumNumberOfWorkers',13);

The fifth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first and third methods since it uses either a MATLAB script M-file or a MATLAB function M-file. Like the first and third methods, this method uses a user-defined PBS configuration. It differs from the other two methods because the script is slightly different. Two MATLAB statements specify a MATLAB pool since this method uses neither batch() nor submit() to make the MATLAB pool. Also, the MATLAB script M-file must quit at the end of the batch process. Another significant difference: This method uses one fewer DCS license than the first and third methods.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
fprintf('                    hostname                         numlabs  labindex\n')
fprintf('                    -------------------------------  -------  --------\n')
tic;

% PARALLEL REGION
spmd
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel region
matlabpool close force;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "r" is a "composite object."
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    r = sprintf('                  hostname                         numlabs  labindex');
    result = strvcat(result,r);
    r = sprintf('                  -------------------------------  -------  --------');
    result = strvcat(result,r);
    tic;

    % PARALLEL REGION
    spmd
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL REGION:  %-31s  %7d  %8d', name,numlabs,labindex);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel region 
    for ndx=1:length(r)          % concatenate composite object "r"
        result = strvcat(result,r{ndx});
    end
    matlabpool close force;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('elapsed time:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript
# matlab -nodisplay -r myfunction

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license and four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
332026.rossmann-ad myusername      standby  myjob.sub   31850   1   1    --  00:01 R 00:00
332028.rossmann-ad myusername      standby  Job1          668   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 2:
  PARALLEL REGION:  rossmann-a638.rcac.purdue.edu            4         2
Lab 1:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1
Lab 3:
  PARALLEL REGION:  rossmann-a637.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  rossmann-a636.rcac.purdue.edu            4         4

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
Elapsed time in parallel region:   3.382151

Output shows the name of one compute node (a639) that processed the job submission file myjob.sub and the two serial regions. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a639,a638,a637,a636) that processed the four parallel regions. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four parallel regions, so there are four lab IDs. Output shows the four labs in scrambled order since the labs process each parallel region independently of the other parallel regions. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased.

The sixth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is like the fifth method since it uses the same MATLAB M-files and the same job submission file myjob.sub. This method differs from the fifth since it uses the MATLAB 'local' configuration rather than a PBS configuration.

Run MATLAB to set the default parallel configuration to the MATLAB 'local' configuration:

$ matlab -nodisplay
>> defaultParallelConfig('local');
>> quit;
$

Submit the job as a single compute node with five processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115287.rossmann-ad myusername      standby  myjob.sub   24010   1   5    --  00:01 R 00:00

Job status shows five processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

Starting matlabpool using the 'local' configuration ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 1:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1
Lab 2:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         2
Lab 3:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         4

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu
Elapsed time in parallel region:   3.425426

Output shows that processor cores on one compute node (a639) processed the entire job. Output also shows that a "matlabpool using the 'local' configuration" is connected to four MATLAB labs. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of labs in the pool and assigns to each lab in the pool a unique value for variable labindex. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each lab pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value in matlabpool open.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

$ qsub -l nodes=1:ppn=13,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myscript (line 6)
matlabpool open 13;

Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.

The seventh method of job submission uses the MATLAB Compiler mcc to compile a MATLAB M-file with a PBS configuration and submits the compiled file to a PBS queue.

This method is similar to the first and third methods since it uses either a MATLAB script M-file or a MATLAB function M-file. Like the first and third methods, this method uses a user-defined PBS configuration. It differs from the other two methods because the script is slightly different. Two MATLAB statements specify a MATLAB pool since this method uses neither batch() nor submit() to make the MATLAB pool. Also, the MATLAB script M-file must quit at the end of the batch process. Another significant difference: This method uses one fewer DCS license than the first and third methods.

Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements:

% FILENAME:  myscript.m

% SERIAL REGION
[c name] = system('hostname');
fprintf('SERIAL REGION:  hostname:%s\n', name)
matlabpool open 4;
fprintf('                    hostname                         numlabs  labindex\n')
fprintf('                    -------------------------------  -------  --------\n')
tic;

% PARALLEL REGION
spmd
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
    pause(2);
end

% SERIAL REGION
elapsed_time = toc;          % get elapsed time in parallel region
matlabpool close force;
fprintf('\n')
[c name] = system('hostname');
name = name(1:length(name)-1);
fprintf('SERIAL REGION:  hostname:%s\n', name)
fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)
quit;

% FILENAME:  myfunction.m

function result = myfunction ()

    % SERIAL REGION
    % Variable "r" is a "composite object."
    [c name] = system('hostname');
    result = sprintf('SERIAL REGION:  hostname:%s', name);
    matlabpool open 4;
    r = sprintf('                  hostname                         numlabs  labindex');
    result = strvcat(result,r);
    r = sprintf('                  -------------------------------  -------  --------');
    result = strvcat(result,r);
    tic;

    % PARALLEL REGION
    spmd
        [c name] = system('hostname');
        name = name(1:length(name)-1);
        r = sprintf('PARALLEL REGION:  %-31s  %7d  %8d', name,numlabs,labindex);
        pause(2);
    end

    % SERIAL REGION
    elapsed_time = toc;          % get elapsed time in parallel region 
    for ndx=1:length(r)          % concatenate composite object "r"
        result = strvcat(result,r{ndx});
    end
    matlabpool close force;
    [c name] = system('hostname');
    name = name(1:length(name)-1);
    r = sprintf('\nSERIAL REGION:  hostname:%s', name);
    result = strvcat(result,r);
    r = sprintf('Elapsed time in parallel region:   %f', elapsed_time);
    result = strvcat(result,r);

end

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_myscript.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the loaded versions. Set the default parallel configuration to your PBS configuration and quit MATLAB. Compile the MATLAB script M-file myscript.m:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit
$ mcc -m myscript.m

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with one processor core and request four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

This job runs on a compute node myjob.sub which in turn submits the parallel job. The first job must run at least as long as the job with the parallel regon since it collects the results of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1   5957 00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   1   5957 00:05 R 00:00
115293.rossmann-ad myusername      standby  Job1        29390   4   4   7005 00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a639.rcac.purdue.edu
run_myscript.sh
rossmann-a639.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.

SERIAL REGION:  hostname:rossmann-a639.rcac.purdue.edu

Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 2:
  PARALLEL REGION:  rossmann-a638.rcac.purdue.edu            4         2
Lab 3:
  PARALLEL REGION:  rossmann-a636.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  rossmann-a633.rcac.purdue.edu            4         4
Lab 1:
  PARALLEL REGION:  rossmann-a639.rcac.purdue.edu            4         1

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann--a639.rcac.purdue.edu
Elapsed time in parallel region:   2.930676

Output shows the name of the compute node (a639) that ran the job submission file myjob.sub and the compiler-generated script run_myscript.sh, the name of the compute node (a639) that ran the two serial regions, and the names of the four compute nodes (a639,a638,a636,a633) that ran the four scattered processor cores (four MATLAB labs) that processed the four parallel regions. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four parallel regions, so there are four lab IDs. Output shows the four labs in scrambled order since the labs process each parallel region independently of the other parallel regions. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply this method of job submission to a MATLAB function M-file, prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ mkdir test
$ cp mywrapper test
$ cp run_mywrapper.sh test
$ cp myjob.sub test
$ cd test
$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running parallel loop. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses acquired. Increase the value of MATLAB_Distrib_Comp_Server in the qsub command to match the new size of the pool.

The eighth method of job submission uses the MATLAB Compiler mcc to compile a MATLAB M-file with the 'local' configuration and submits the compiled file to a PBS queue.

This method is like the seventh method since it uses the same MATLAB M-files and the same job submission file myjob.sub. This method differs from the seventh since it uses the MATLAB 'local' configuration (R2011a added support for compiling PCT code on the 'local' configuration).

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the versions loaded. Set the default parallel configuration to the 'local' configuration and compile the MATLAB script M-file:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('local');
>> quit
$ mcc -m myscript.m

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with four processor cores:

$ qsub -l nodes=1:ppn=4,walltime=00:05:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername      standby  myjob.sub   28611   1   4    --  00:05 R 00:00

Job status shows four processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a299.rcac.purdue.edu
run_myscript.sh
rossmann-a299.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.

SERIAL REGION:  hostname:rossmann-a299.rcac.purdue.edu

Starting matlabpool using the 'local' configuration ... connected to 4 labs.
                    hostname                         numlabs  labindex
                    -------------------------------  -------  --------
Lab 1:
  PARALLEL REGION:  rossmann-a299.rcac.purdue.edu            4         1
Lab 2:
  PARALLEL REGION:  rossmann-a299.rcac.purdue.edu            4         2
Lab 3:
  PARALLEL REGION:  rossmann-a299.rcac.purdue.edu            4         3
Lab 4:
  PARALLEL REGION:  rossmann-a299.rcac.purdue.edu            4         4

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.


SERIAL REGION:  hostname:rossmann-a299.rcac.purdue.edu
Elapsed time in parallel region:   2.583002

Output shows that processor cores on one compute node (a299) processed the entire job. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four parallel regions, so there are four lab IDs. Output shows the four labs in sequential order even though the labs process each parallel region independently of the other parallel regions. Finally, output shows the time that the four labs spent running the four parallel regions. The time implies that the four labs run in parallel. Had they run in serial, then the time would have been at least eight seconds since each of the four spmd statements pauses for two seconds.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the eighth method of job submission to a MATLAB function M-file, prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ qsub -l nodes=1:ppn=4,walltime=00:01:00 myjob.sub

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must equal the value in the matlabpool open statement.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

qsub -l nodes=1:ppn=13,walltime=00:01:00 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myscript (line 6)



Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.
}
distcomp:matlabpool:RunValidation

For more information about MATLAB Parallel Computing Toolbox:

MATLAB Distributed Computing Server (distributed job)

The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. PCT enables task and data parallelism on a multicore processor. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; versions R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the coarse-grained parallelism of several independent tasks in a distributed job. The tasks of a distributed job may be identical or similar, but can be completely different from one another. The tasks do not communicate with each other. They need not run simultaneously. A multi-core compute node might run one task or several tasks in parallel and/or in succession. Areas of application include embarrassingly parallel computations, such as parameter sweeps.

This section illustrates two methods about submitting a small MATLAB distributed job with several identical but independent tasks to a PBS queue. The tasks display the names of the compute nodes running the tasks. The system function hostname returns two values: a numerical code and the name of the compute node that runs the command. Also, there is an explanation that illustrates what happens when the number of tasks exceeds the number of DCS licenses.

The first method runs on a front end a MATLAB client which calls the MATLAB submit() function with the 'torque' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'torque' scheduler, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out 21 licenses: one MATLAB license for the client running on the front end, one PCT license, and 19 instances of the DCS licenses. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. The DCS licenses remain active between running function submit() and job completion. The DCS licenses do not appear in the output of function license().

The second method uses the PBS qsub command to submit a job to a PBS queue. This method runs on a compute node a MATLAB client which uses the 'local' configuration to run all tasks. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. This job is completely off the front end.

The following table summarizes MATLAB license usage:

Method MATLAB PCT DCS mcc
submit() with 'torque' scheduler 1 1 number of tasks 0
submit() with 'local' scheduler, qsub 1 1 0 0

For the first method of job submission, prepare a MATLAB script M-file which finds the 'torque' scheduler, defines a distributed job and 27 tasks, and calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','type','torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+27');
job = createJob(sched);

% Make several new tasks in a job.
% Here, the number of tasks is three more than the number of processor cores per compute node.
for i = 1:27
    task = createTask(pjob,@system,2,{'hostname'});
end

% To run your functions instead of a system function, set the
% necessary file dependencies.  This tells a MATLAB worker
% (compute node) where to find the files of your functions.
% set(job,'FileDependencies',{'myf_1.m','myf_2.m','myf_3.m'});
% createTask(pjob,@myf_1,1,{});
% createTask(pjob,@myf_2,1,{});
% createTask(pjob,@myf_3,1,{});
                                                                                                                                                                                                                                                                            
submit(pjob);
disp('FINISHED SUBMITTING')

On a front end, load the MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, verify what the default parallel configuration is. Either the 'local' configuration or a PBS configuration will work and will yield the same result. Then run mypbssubmit.m. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, get all output arguments into a cell array and display the results. When you do not want to keep the files of a job, use MATLAB function destroy() to erase them from your current working directory:

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay -singleCompThread
>> defaultParallelConfig('local')
>> mypbssubmit
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> disp(pjob.get('State'))
finished
>> result = getAllOutputArguments(pjob)

result =
   [0]   [1x28 char]
   [0]   [1x28 char]
   [0]   [1x28 char]
   ...
   [0]   [1x28 char]

>> result{1:54}
>> pjob.destroy;
>> quit;

View job status from qstat:


rossmann-adm.rcac.purdue.edu: 
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
160901.rossmann-ad     kes      standby  Job4Task1           --      1   1    --  00:01 Q   -- 
160902.rossmann-ad     kes      standby  Job4Task2           --      1   1    --  00:01 Q   -- 
160903.rossmann-ad     kes      standby  Job4Task3           --      1   1    --  00:01 Q   -- 
...
160927.rossmann-ad     kes      standby  Job4Task27          --      1   1    --  00:01 Q   -- 

View the results of the tasks (in an abbreviated form):

ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a004.rcac.purdue.edu
ans = rossmann-a004.rcac.purdue.edu
ans = rossmann-a004.rcac.purdue.edu
ans = rossmann-a004.rcac.purdue.edu
ans = rossmann-a004.rcac.purdue.edu
ans = rossmann-a005.rcac.purdue.edu
ans = rossmann-a005.rcac.purdue.edu
ans = rossmann-a005.rcac.purdue.edu
ans = rossmann-a005.rcac.purdue.edu
ans = rossmann-a005.rcac.purdue.edu
ans = rossmann-a006.rcac.purdue.edu
...
ans = rossmann-a006.rcac.purdue.edu

Several compute nodes participated in the processing.

When a task does not get a license, it does not run. Suppose Task8 did not get a license. View the log file for Task8, Job1.Task8.log:

/var/spool/PBS/mom_priv/jobs/1808906[8].rossmann-adm.rcac.purdue.edu.SC: line 29: cd: /tmp/pbs.1808906[8].rossmann-adm.rcac.purdue.edu: No such
 file or directory
Executing: /apps/rhel5/MATLAB/R2011b/bin/worker
License checkout failed.
License Manager Error -4
Maximum number of users for MATLAB_Distrib_Comp_Engine reached.
Try again later.
To see a list of current users use the lmstat utility or contact your License Administrator.

Troubleshoot this issue by visiting:
http://www.mathworks.com/support/lme/R2011b/4

Diagnostic Information:
Feature: MATLAB_Distrib_Comp_Engine
License path: /home/myusername/.matlab/R2011b_licenses:/apps/rhel5/MATLAB/R2011b/licenses/license.dat:/apps/rhe
l5/MATLAB_R2011b/licenses/ecn.lic:/apps/rhel5/MATLAB/R2011b/licenses/mdce.lic
FLEXnet Licensing error: -4,132.
MATLAB exited with code: 1

The log file for Task8 reads, "License checkout failed." Perhaps other tasks (including other users' tasks) had already taken all available licenses.

After the "FINISHED SUBMITTING" message, you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with PBS qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy(), so you may rerun MATLAB and find your job:

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','type','torque');
>> job=findJob(sched,'State','finished');
>> result=getAllOutputArguments(job);

result =
   [0]   [1x28 char]
   [0]   [1x28 char]
   [0]   [1x28 char]
   ...
   [0]   [1x28 char]

>> result{1:54}
>> job.destroy;
>> quit;
$

The second method of submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first method since it uses a MATLAB script M-file. It differs from the first method because the script is slightly different. This script can wait for all tasks to finish. Also, this method differs from the first because the MATLAB client runs on a compute node rather than on the front end. Either the 'local' configuration or a PBS configuration will work and will yield the same result.

Prepare a MATLAB script M-file with an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclsubmit.m

sched = findResource('scheduler','type','local');
set(sched,'ClusterMatlabRoot',matlabroot);
job = createJob(sched);
task = createTask(j,@system,2,{{'hostname'},{'hostname'},{'hostname'},{'hostname'},{'hostname'},{'hostname'},{'hostname'},{'hostname'},{'hostname'}});
submit(j);
disp('FINISHED SUBMITTING')

waitForState(job);
results = getAllOutputArguments(job);
results{1:18}
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclsubmit

Submit the job with either the 'local' configuration or a PBS configuration as the default parallel configuration; the result will be the same. Request one license from the Parallel Computing Toolbox (PCT):

$ qsub -l nodes=1,gres=Parallel_Computing_Toolbox+1 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                         Req'd  Req'd   Elap
Job ID               Username Queue    Jobname          SessID NDS   TSK Memory Time  S Time
-------------------- -------- -------- ---------------- ------ ----- --- ------ ----- - -----
161181.rossmann -ad     myusername standby  Job1Task1           --      1   1    --  00:01 Q   -- 

View results (in an abbreviated form) in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a002.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

FINISHED SUBMITTING

ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
ans = rossmann-a002.rcac.purdue.edu
...
ans = rossmann-a002.rcac.purdue.edu

Output shows that one compute node (a002) processed all tasks of the distributed job.

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about distributed jobs:

MATLAB Distributed Computing Server (parallel job)

The MATLAB Parallel Computing Toolbox (PCT) offers a parallel job via the MATLAB Distributed Computing Server (DCS). The tasks of a parallel job are identical, run simultaneously on several MATLAB workers (labs), and communicate with each other. PCT offers a distributed-memory computing environment with a maximum of eight MATLAB workers (labs, MPI ranks; versions R2009a) and 12 workers (labs, MPI ranks; version R2011a) running on the local configuration. Moreover, the MATLAB Distributed Computing Server (DCS) scales up PCT applications to the limit of your DCS licenses. This section illustrates an MPI-like program. Areas of application include distributed arrays and message passing.

This section illustrates ten methods about submitting a small, MATLAB parallel job with four workers running one MPI-like task to a PBS queue. The MATLAB program broadcasts an integer, which might be the number of slices of a numerical integration, to four workers and gathers the names of the compute nodes running the workers and the lab IDs of the workers. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the program.

The first method runs on a front end a MATLAB client which calls the MATLAB batch() function with a user-defined PBS configuration. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out seven licenses: one MATLAB license for the client running on the front end, one PCT license, and five DCS licenses. One DCS license runs the batch session (including the two serial regions in the MATLAB M-code) while the other four run the spmd statement. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running function batch() and quitting MATLAB. The five DCS licenses remain active between running function batch() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The second method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The third method runs on a front end a MATLAB client which calls the MATLAB submit() function with the 'torque' scheduler. When you need more granularity in your submission, use submit(). Since it uses the 'torque' scheduler, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the front end, one PCT license, and four DCS licenses. One DCS license runs the batch session while the other four run the parallel code. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. The five DCS licenses remain active between running function submit() and job completion, even when the serial portion of the program is running. The DCS licenses do not appear in the output of function license().

The fourth method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB submit() function with the 'local' configuration. When you need more granularity in your submission, use submit(). Since it uses the 'local' scheduler, this method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as findResource(), and quitting MATLAB. This job is completely off the front end.

The fifth method uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS configuration which scatters the MATLAB workers onto different compute nodes. Since it uses a PBS configuration, this method, when executed, uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the parallel job. This job is completely off the front end.

The sixth method uses the PBS qsub command to submit to a compute node a MATLAB client which interprets an M-file with the 'local' configuration. Since it uses the 'local' configuration, this method, when executed, uses the MATLAB interpreter; so, it requires and checks out one license: one MATLAB license for the client running on the compute node. The MATLAB license remains active between starting and quitting MATLAB. This job is completely off the front end.

The seventh method uses the MATLAB Compiler mcc and the default parallel configuration set to a PBS configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses a PBS configuration during compilation, this method, when executed, uses the MATLAB Distributed Computing Server; so, it requires and checks out four DCS licenses during the execution of the spmd statement. This job is completely off the front end.

The eighth method uses the MATLAB Compiler mcc and the default parallel configuration set to the 'local' configuration to compile a MATLAB M-file and submits the compiled file to a PBS queue (support for running compiled PCT code on the 'local' configuration was added in R2011a; this feature removes the need for DCS licenses in some cases). It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses the 'local' configuration, this method, when executed, uses no license. This job is completely off the front end.

The ninth method uses the MATLAB Compiler mcc, qsub, and your PBS configuration to compile a parallel job. This method is the third method compiled. This method compiles a script M-file which contains the specifications of job submission plus the code of the parallel job in a function M-file. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses a PBS configuration during compilation, this method, when executed, uses the MATLAB Distributed Computing Server; so, it requires and checks out four DCS licenses during the execution of the spmd statement. This job is completely off the front end.

The tenth method uses the MATLAB Compiler mcc, qsub, and the MATLAB 'local' configuration to compile a parallel job. This method is the fourth method compiled. This method compiles a script M-file which contains the specifications of job submission plus the code of the parallel job in a function M-file. It requires and checks out one MATLAB Compiler license. This license remains checked out for 30 minutes after the compilation completes. Since it uses a PBS configuration during compilation, this method, when executed, uses the MATLAB Distributed Computing Server; so, it requires and checks out four DCS licenses during the execution of the spmd statement. This job is completely off the front end.

The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. Each time you issue the compiler command, you reset the 30-minute timer. When running mcc at the MATLAB prompt, you hold the license as long as MATLAB remains open. To release the license, quit MATLAB. Quitting MATLAB is necessary to release the license, but not sufficient. When you quit MATLAB before the 30-minute timer runs out, then the license remains checked out for the remaining time. When running mcc at the Linux prompt, the 30-minute timer starts or restarts. The license manager tracks MATLAB licenses per user per cluster. If you run the MATLAB Compiler on two different clusters, you use two Compiler licenses. To minimize your license usage, run the Compiler on one cluster.

You can share your compiled program with colleagues who do not have MATLAB licenses. Your compiled program will take its required licenses within itself. So, you do not need to install any license file on your colleague's machine.

The following table summarizes MATLAB license usage:

Method Description MATLAB PCT DCS mcc Limitations
1 batch() with user-defined PBS configuration 1 1 Matlabpool + 1 0 number of MATLAB,PCT,DCS licenses purchased
2 batch() with 'local' configuration, qsub 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
3 submit() with 'torque' scheduler 1 1 MaximumNumberOfWorkers 0 number of MATLAB,PCT,DCS licenses purchased
4 submit() with 'local' scheduler, qsub 1 1 0 0 local scheduler with 8 (R2009a) and 12 (R2011a) workers
5 qsub with user-defined PBS configuration 1 1 pool size 0 number of MATLAB,PCT,DCS licenses purchased
6 qsub with 'local' configuration 1 1 0 0 local configuration with 8 (R2009a) and 12 (R2011a) workers
7 Compiler with user-defined PBS configuration, qsub 0 0 pool size 1 number of DCS licenses purchased
8 Compiler with 'local' configuration, qsub 0 0 0 1 local configuration with 8 (R2009a) and 12 (R2011a) workers
9 Compiler with user-defined PBS configuration, qsub 0 0 MaximumNumberOfWorkers 1 number of DCS licenses purchased
10 Compiler with 'local' configuration, qsub 0 0 0 1 local configuration with 8 (R2009a) and 12 (R2011a) workers

Prepare a MATLAB parallel program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if (labindex == 1)
     disp(result)
end

% FILENAME:  myfunction.m

function result = myfunction ()

   if labindex == 1
       % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
       N = labBroadcast(1,int64(1000))
   else
       % Each lab (rank) receives the broadcast value from lab (rank) #1.
       N = labBroadcast(1)
   end

   % Form a string with host name, total number of labs, lab ID, and broadcast value.
   [c name] =system('hostname');
   name = name(1:length(name)-1);
   fmt = num2str(floor(log10(numlabs))+1);
   str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

   % Apply global concatenation to all str's.
   % Store the concatenation of str's in the first dimension (row) and on lab #1.
   result = gcat(str,1,1);

end

Both M-files display the names of all compute nodes which run the job and the associated lab IDs. The script M-file uses fprintf() to display the results. The function M-file returns a single value which contains a concatenation of the results.

The execution of a parallel job has all copies of the single task running concurrently and perhaps also communicating with each other.

The first method of job submission uses function batch() to offload from the MATLAB client running on the front end to a batch session running on a worker (compute node) the responsibilities of supervising another set of workers called the pool and accumulating the results. Since function batch() accepts only MATLAB pool jobs, you must convert the parallel job to a pool job by surrounding the code with an spmd statement (MATLAB function batch() accepts either a script M-file or a function M-file). Use appropriate filenames, here named myscript.m and myfunction.m:

% FILENAME:  myscript.m

% Convert this parallel job to a pool job.
spmd

if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1)
if labindex == 1
    disp(result)
end

end   % spmd

% FILENAME:  myfunction.m

function result = myfunction ()

    result = 0;

    % Convert this parallel job to a pool job.
    spmd

    if labindex == 1
        % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
        N = labBroadcast(1,int64(1000));
    else
        % Each lab (rank) receives the broadcast value from lab (rank) #1.
        N = labBroadcast(1);
    end

    % Form a string with host name, total number of labs, lab ID, and broadcast value.
    [c name] =system('hostname');
    name = name(1:length(name)-1);
    fmt = num2str(floor(log10(numlabs))+1);
    str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

    % Apply global concatenate to all str's.
    % Store the concatenation of str's in the first dimension (row) and on lab #1.
    rslt = gcat(str,1,1);
    if (labindex == 1) disp(result)

    end   % spmd
    result = rslt{1};

end   % function

On the front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, view the current default parallel configuration; most likely, it is the 'local' configuration. Call MATLAB function batch() to make a four-lab pool on which to run the MATLAB code in the file myscript.m. In the call, replace the 'local' configuration by specifying your PBS configuration which you previously designed with the Configuration Manager (using the 'local' configuration at this point will run the spmd statement on the front end). This particular PBS configuration scatters the labs to different compute nodes to verify that a four-lab spmd statement actually uses five processor cores. Also, capture job output in the diary. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, view results by viewing the diary. When you do not want to keep the files of your job, use MATLAB function destroy() to erase them from your current working directory. After this job finishes, the default parallel configuration remains the 'local' configuration.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> disp(defaultParallelConfig);
local
>> pjob=batch('myscript','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> license('inuse')
distrib_computing_toolbox
matlab
>> disp(pjob.get('State'))
finished
>> license('inuse')
distrib_computing_toolbox
matlab
>> pjob.diary;
>> ls -l
>> pjob.destroy;
>> ls -l
>> disp(defaultParallelConfig);
local
>> quit;
$

The license name distrib_computing_toolbox refers to the Parallel Computing Toolbox.

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
156491.rossmann-a myusername standby  Job1          --    5   5    --  00:01 Q   --

Job status shows that the MATLAB client submitted this job as five compute nodes (NDS), each with one processor core (TSK), and with a wall time of one minute. The call to function batch() specifies four labs to evaluate the four labs of the parallel regions (spmd statement). The fifth lab runs the batch session, myscript.m, and accumulates the results. This arrangement explains the need for five DCS licenses.

View job output from the diary:

  
Lab 1: 
  rossmann-a001.rcac.purdue.edu:4:1:1000   
  rossmann-a002.rcac.purdue.edu:4:2:1000   
  rossmann-a003.rcac.purdue.edu:4:3:1000   
  rossmann-a004.rcac.purdue.edu:4:4:1000  

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs in the pool which processed the parallel region. The fifth worker, which runs the batch session, does not appear in the output since there is no serial region. Because the MATLAB pool requires the lab running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

The MATLAB client scattered the five processor cores (five MATLAB labs) among five different compute nodes. A processor core of one compute node processed the batch session while four processor cores of four different compute nodes (a001,a002,a003,a004) processed the parallel regions. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each parallel region the name of the compute node.

After calling the MATLAB function batch(), you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with the PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','Configuration','mypbsconfig');
>> pjob=findJob(sched,'State','finished');
>> pjob.diary;
>> pjob.destroy;
>> quit;
$

To apply the first method of job submission to a function M-file, use one of the following sequences:

>> pjob=batch('myfunction','Matlabpool',4,'Configuration','mypbsconfig','CaptureDiary',true);
>> disp(pjob.get('State'))
finished
>> pjob.diary

>> pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','mypbsconfig');                    
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

>> pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','mypbsconfig');
>> disp(pjob.get('State'))
finished
>> result=getAllOutputArguments(pjob);
>> result{1}

Function batch() offers an alternate mode of operation. If you exclude the property Configuration and its value from the list of arguments of function batch(), then batch() reads the default parallel configuration. Before calling batch(), set the default parallel configuration with your PBS configuration which you previously designed with the Configuration Manager. This change of the default parallel configuration is immediate and permanent; the PBS configuration remains the default parallel configuration during subsequent runs of MATLAB. MATLAB function batch() reads the default parallel configuration to discover your instructions about submission.

To scale up this method to handle a real application, use the Configuration Manager in the Parallel menu to increase the wall time to accommodate a longer running job. Enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of DCS licenses purchased.

The second method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first method since it uses function batch() and the same MATLAB M-file, either myscript.m or myfunction.m, modified from a parallel job to a pool job with an spmd statement. This method differs from the first because the MATLAB client runs on a compute node rather than on the front end and the client uses the MATLAB 'local' configuration rather than a user-defined PBS configuration.

Prepare a MATLAB script M-file that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m, which specifies the 'local' configuration, and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

% FILENAME:  mylclbatch.m

!echo "mylclbatch.m"
!hostname

pjob=batch('myscript','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclbatch

Submit the job as a single compute node with six processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=6,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclbatch.m; four processor cores run the four copies of the parallel region.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
99025.rossmann-ad myusername      standby  myjob.sub   30197   1   6    --  00:01 R 00:00

Job status shows six processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a017.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclbatch.m
rossmann-a017.rcac.purdue.edu
Lab 1:
  rossmann-a017.rcac.purdue.edu:4:1:1000
  rossmann-a017.rcac.purdue.edu:4:2:1000
  rossmann-a017.rcac.purdue.edu:4:3:1000
  rossmann-a017.rcac.purdue.edu:4:4:1000

Output shows that the nonnegative scalar integer which is the value of the property Matlabpool (4) defined the number of labs (4) in the pool which processed the parallel region. While output does not explicitly show the fifth worker, that worker runs the batch session myscript.m. Because the MATLAB pool requires the lab running the batch session in addition to the N labs in the pool, there must be at least N+1 processor cores available on the compute node.

Output shows that processor cores on one compute node (a017) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session while four processor cores processed the parallel regions. Variable numlabs shows the total number of labs (4). Variable labindex shows the ID of an individual lab. There are four labs, so there are four lab IDs.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

pjob=batch('myfunction','Matlabpool',4,'Configuration','local','CaptureDiary',true);
pjob.wait;
pjob.diary
pjob.load;
disp(ans)
result = getAllOutputArguments(pjob);
disp(result{1}.ans)

pjob=batch('myfunction',1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
disp(result{1})

pjob=batch(@myfunction,1,{},'Matlabpool',4,'Configuration','local');
pjob.wait;
result = getAllOutputArguments(pjob);
disp(result{1})

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch(). The maximum possible size of the pool is one less than the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be two greater than the value of Matlabpool.

Specifying a MATLAB pool with 12 (R2011a) labs means a total of 13 workers. This exceeds the 'local' configuration. The relevant lines of code and the error follow:

pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);

$ qsub -l nodes=1:ppn=14,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using batch (line 172)
You requested a minimum of 13 workers but only 12 workers are allowed with the
local scheduler.

Error in mylclbatch (line 6)
pjob=batch('myscript','Matlabpool',12,'Configuration','local','CaptureDiary',true);}

The third method of job submission uses function submit() to submit a parallel job to a PBS queue. Function submit() can accept a parallel job in the form of a MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file).

Prepare a MATLAB script M-file which finds the MATLAB 'torque' scheduler, which specifies scattering four processor cores to four different compute nodes and one minute of wall time for this short job and the number of PCT and DCS licenses, which defines a MATLAB parallel job with four workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','type','torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4');
pjob=createParallelJob(sched);
set(pjob,'MinimumNumberOfWorkers',4);
set(pjob,'MaximumNumberOfWorkers',4);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

On a front end, load a MATLAB module and verify the version of the MATLAB module loaded. Run a MATLAB client. At the MATLAB prompt, run mypbssubmit. To monitor when your job finishes, run the PBS command qstat in a shell (exclamation point "!") or MATLAB function get(). After your job finishes, get all output arguments into a cell array and display the results. When you do not want to keep the files of a job, use MATLAB function destroy() to erase them from your current working directory.

$ module load matlab/R2011b
$ module list
  1) matlab/R2011b
$ matlab -nodisplay
>> mypbssubmit
FINISHED SUBMITTING
>> !qstat -u myusername
>> disp(pjob.get('State'))
queued
>> disp(pjob.get('State'))
running
>> disp(pjob.get('State'))
finished
>> result = getAllOutputArguments(pjob)

result = 

    [4x39 char]
    []
    []
    []

>> result{1}
>> ls -l
>> pjob.destroy
>> ls -l
>> quit
$

View job status from qstat:


rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
157168.rossmann-a myusername standby  Job1          --    4   4    --  00:01 Q   -- 

Job status shows that the MATLAB client submitted this job as four compute nodes (NDS), each with one processor core (TSK), and the requested wall time of one minute. The call to function submit() specifies four labs as the minimum and maximum number of labs of the MATLAB pool. Four labs evaluate the four copies of the parallel job.

View job output:

rossmann-a000.rcac.purdue.edu:4:1:1000   
rossmann-a001.rcac.purdue.edu:4:2:1000   
rossmann-a002.rcac.purdue.edu:4:3:1000   
rossmann-a003.rcac.purdue.edu:4:4:1000

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (4) defined the number of labs (4) in the pool. Because the MATLAB pool requires N labs, there must be at least N processor cores available on the cluster.

The MATLAB client scattered the four processor cores (four MATLAB labs) among four different compute nodes. Four processor cores on four different compute nodes (a000,a001,a002,a003) processed the four labs of the parallel job. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Finally, each lab received the broadcast value: 1,000.

After the "FINISHED SUBMITTING" message, you may terminate your MATLAB client and perform other chores while your compute-intensive job runs. You may query your job with PBS command qstat entered at the Linux prompt, just like for any other PBS job. MATLAB will make job-related files in either the current working directory or a directory specified with DataLocation. These files exist until you call the MATLAB function destroy().

$ module load matlab/R2011b
$ matlab -nodisplay
>> sched=findResource('scheduler','type','torque');
>> pjob=findJob(sched,'State','finished');
>> result=getAllOutputArguments(pjob);
>> result{1}
>> pjob.destroy
>> quit
$

For practice, modify mypbssubmit.m to rerun this example as a single compute node with four processor cores:

set(sched,'SubmitArguments','-l nodes=1:ppn=4,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4');

View job status from qstat:

rossmann-adm.rcac.purdue.edu:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
115202.rossmann-a myusername standby  Job1          --    1   4    --  00:01 R   --

View job output:

rossmann-a000.rcac.purdue.edu:4:1:1000   
rossmann-a000.rcac.purdue.edu:4:2:1000   
rossmann-a000.rcac.purdue.edu:4:3:1000   
rossmann-a000.rcac.purdue.edu:4:4:1000 

Output shows that processor cores of one compute node (a000) processed the entire job.

Function submit() offers an alternate mode of operation. It can read your PBS configuration made previously with the Configuration Manager. Here is an alternate version of mypbssubmit.m which relies on mypbsconfig for the several details of job submission:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','Configuration','mypbsconfig');
pjob=createParallelJob(sched);
set(pjob,'MinimumNumberOfWorkers',4);
set(pjob,'MaximumNumberOfWorkers',4);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

To scale up this method to handle a real application, increase the wall time to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of DCS licenses purchased. Finally, increase the value of MATLAB_Distrib_Comp_Server to match the value of MinimumNumberOfWorkers.

The fourth method of job submission uses the PBS qsub command to submit a parallel job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar the third method since it uses function submit() and the same MATLAB function M-file myfunction.m (MATLAB function submit() accepts only a function M-file). This method differs from the third because the MATLAB client runs on a compute node rather than on the front end and the client uses the 'local' scheduler rather than the MATLAB 'torque' scheduler.

Prepare a MATLAB script M-file which finds the MATLAB 'local' scheduler, which defines a MATLAB parallel job with four workers (labs) and one task, and which calls MATLAB function submit(). Use an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclsubmit.m

!echo "mylclsubmit.m"
!hostname

sched = findResource('scheduler','type','local');
set(sched,'ClusterMatlabRoot',matlabroot);
pjob=createParallelJob(sched);
set(pjob,'MinimumNumberOfWorkers',4);
set(pjob,'MaximumNumberOfWorkers',4);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED SUBMITTING');

pjob.wait;
result = getAllOutputArguments(pjob);
disp(result{1});
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r mylclsubmit

Submit the job as a single compute node with five processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

One processor core runs myjob.sub and mylclsubmit.m, and four processor cores run the four copies of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
157171.rossmann-a myusername standby  myjob.sub     --    1   5    --  00:01 Q   -- 

Job status shows five processor cores (TSK) on a single compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a000.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

mylclsubmit.m
rossmann-a000.rcac.purdue.edu
FINISHED SUBMITTING
rossmann-a000.rcac.purdue.edu:4:1:1000
rossmann-a000.rcac.purdue.edu:4:2:1000
rossmann-a000.rcac.purdue.edu:4:3:1000
rossmann-a000.rcac.purdue.edu:4:4:1000

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (4) defined the number of labs (4) in the pool which processed the parallel job. Because the MATLAB pool requires N workers, there must be at least N processor cores available on the compute node.

Processor cores of compute node (a000) processed the entire job. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Finally, each lab received the broadcast value: 1,000.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value of MaximumNumberOfWorkers.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

set(pjob,'MinimumNumberOfWorkers',13);
set(pjob,'MaximumNumberOfWorkers',13);

$ qsub -l nodes=1:ppn=14,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using distcomp.simpleparalleljob/pSetMinimumNumberOfWorkers (line 59)
MinimumNumberOfWorkers must be the same as or less than MaximumNumberOfWorkers
for a job

Error in mylclsubmit (line 9)
set(pjob,'MinimumNumberOfWorkers',13);

The fifth method of submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is similar to the first and third methods since it uses either a MATLAB script M-file or a MATLAB function M-file. Like the first and third methods, this method uses a user-defined PBS configuration. It differs from the other two methods because the script is slightly different. Since this method uses neither batch() nor submit() to make the MATLAB pool, two MATLAB statements specify a MATLAB pool and an spmd statement converts this parallel job to a pool job. Also, the MATLAB script M-file must quit at the end of the batch process. Another significant difference: This method uses one fewer DCS license than the first method.

Modify the MATLAB M-files myscript.m and myfunction.m with matlabpool and spmd statements. Also, modify myscript.m with the quit statement:

% FILENAME:  myscript.m

% Specify pool size.
% Convert the parallel job to a pool job.
matlabpool open 4;
spmd


if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if labindex == 1
    disp(result)
end


end   % spmd
matlabpool close force;
quit;

% FILENAME:  myfunction.m


function result = myfunction ()

    result = 0;

    % Specify pool size.
    % Convert the parallel job to a pool job.
    matlabpool open 4;
    spmd

    if labindex == 1
        % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
        N = labBroadcast(1,int64(1000));
    else
        % Each lab (rank) receives the broadcast value from lab (rank) #1.
        N = labBroadcast(1);
    end

    % Form a string with host name, total number of labs, lab ID, and broadcast value.
    [c name] =system('hostname');
    name = name(1:length(name)-1);
    fmt = num2str(floor(log10(numlabs))+1);
    str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

    % Apply global concatenate to all str's.
    % Store the concatenation of str's in the first dimension (row) and on lab #1.
    rslt = gcat(str,1,1);

    end   % spmd
    result = rslt{1};
    matlabpool close force;

end   % function

Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

module load matlab/R2011b
cd $PBS_O_WORKDIR
unset DISPLAY

# -nodisplay: run MATLAB in text mode; X11 server not needed
# -r:         read MATLAB program; use MATLAB JIT Accelerator
matlab -nodisplay -r myscript
# matlab -nodisplay -r myfunction

Run MATLAB to set the default parallel configuration to your PBS configuration:

$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit;
$

Submit the job as a single compute node with one processor core and request one PCT license:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub

This job submission causes a second job submission.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu:
                                                    Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
465534.hansen-a kes      standby  myjob.sub    5620   1   1    --  00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu:
                                                    Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
465534.hansen-a kes      standby  myjob.sub    5620   1   1    --  00:05 R 00:00
465545.hansen-a kes      standby  Job2          --    4   4    --  00:01 R   -- 

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a006.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
Lab 1:
  rossmann-a006.rcac.purdue.edu:4:1:1000
  rossmann-a007.rcac.purdue.edu:4:2:1000
  rossmann-a008.rcac.purdue.edu:4:3:1000
  rossmann-a009.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of one compute node (a006) that processed the job submission file myjob.sub. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a006,a007,a008,a009) that processed the four parallel regions. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each parallel region the name of the compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased.

The sixth method of job submission uses the PBS qsub command to submit a job to a PBS queue. If your MATLAB client is compute-intensive or you are ready to move your application to a production mode, use this method to move your client to a compute node.

This method is like the fifth method since it uses the same MATLAB M-files and the same job submission file myjob.sub. This method differs from the fifth since it uses the MATLAB 'local' configuration rather than a PBS configuration.

Run MATLAB to set the default parallel configuration to the MATLAB 'local' configuration:

$ matlab -nodisplay
>> defaultParallelConfig('local');
>> quit;
$

Submit the job as a single compute node with five processor cores and request one PCT license:

$ qsub -l nodes=1:ppn=5,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115287.rossmann-ad myusername      standby  myjob.sub   24010   1   5    --  00:01 R 00:00

Job status shows five processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a006.rcac.purdue.edu

                            < M A T L A B (R) >
                  Copyright 1984-2011 The MathWorks, Inc.
                    R2011b (7.13.0.564) 64-bit (glnxa64)
                              August 13, 2011


To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

Starting matlabpool using the 'local' configuration ... connected to 4 labs.
Lab 1:
  hansen-a006.rcac.purdue.edu:4:1:1000
  hansen-a006.rcac.purdue.edu:4:2:1000
  hansen-a006.rcac.purdue.edu:4:3:1000
  hansen-a006.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows that processor cores of one compute node (a006) processed the entire job. Output also shows that a "matlabpool using the 'local' configuration" is connected to four MATLAB labs. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of labs in the pool and assigns to each lab in the pool a unique value for variable labindex.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must be one greater than the value in matlabpool open.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

$ qsub -l nodes=1:ppn=14,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myscript (line 6)
matlabpool open 13;

Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.

The seventh method of job submission uses the MATLAB Compiler mcc to compile a MATLAB M-file with a PBS configuration and submits the compiled file to a PBS queue.

This method uses the same M-files as the fifth method. Like the fifth method, this method uses one fewer DCS license than the first method:

% FILENAME:  myscript.m


% Specify pool size.
% Convert the parallel job to a pool job.
matlabpool open 4;
spmd



if labindex == 1
    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
    N = labBroadcast(1,int64(1000));
else
    % Each lab (rank) receives the broadcast value from lab (rank) #1.
    N = labBroadcast(1);
end

% Form a string with host name, total number of labs, lab ID, and broadcast value.
[c name] =system('hostname');
name = name(1:length(name)-1);
fmt = num2str(floor(log10(numlabs))+1);
str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

% Apply global concatenate to all str's.
% Store the concatenation of str's in the first dimension (row) and on lab #1.
result = gcat(str,1,1);
if labindex == 1
    disp(result)
end



end   % spmd
matlabpool close force;

quit;

% FILENAME:  myfunction.m 

function result = myfunction ()

    result = 0;

    % Specify pool size.
    % Convert the parallel job to a pool job.
    matlabpool open 4;
    spmd



    if labindex == 1
        % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
        N = labBroadcast(1,int64(1000));
    else
        % Each lab (rank) receives the broadcast value from lab (rank) #1.
        N = labBroadcast(1);
    end

    % Form a string with host name, total number of labs, lab ID, and broadcast value.
    [c name] =system('hostname');
    name = name(1:length(name)-1);
    fmt = num2str(floor(log10(numlabs))+1);
    str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

    % Apply global concatenate to all str's.
    % Store the concatenation of str's in the first dimension (row) and on lab #1.
    rslt = gcat(str,1,1);



    end   % spmd
    result = rslt{1};
    matlabpool close force;

end   % function

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_myscript.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the loaded versions. Set the default parallel configuration to your PBS configuration and quit MATLAB. Compile the MATLAB script M-file. Make a subdirectory and copy two of the files that the compiler made and your job submission file. Make the new subdirectory the current working directory and submit:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig');
>> quit
$ mcc -m myscript.m

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with one processor core and request four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

This job runs on a compute node myjob.sub which in turn submits the parallel (converted to a pool) job. The first job must run at least as long as the second job since it collects the results of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername    standby  myjob.sub   28611   1   1   5957 00:05 R 00:00

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                                  Req'd  Req'd   Elap
Job ID          Username       Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- --------       -------- ---------- ------ --- --- ------ ----- - -----
115292.rossmann-ad myusername    standby  myjob.sub   28611   1   1   5957 00:05 R 00:00
115293.rossmann-ad myusername    standby  Job1        29390   4   4   7005 00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a006.rcac.purdue.edu
run_myscript.sh
rossmann-a006.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.
Starting matlabpool using the 'PBSscatter' configuration ... connected to 4 labs.
Lab 1:
  hansen-a006.rcac.purdue.edu:4:1:1000
  hansen-a007.rcac.purdue.edu:4:2:1000
  hansen-a008.rcac.purdue.edu:4:3:1000
  hansen-a009.rcac.purdue.edu:4:4:1000

Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of the compute node (a006) that ran the job submission file myjob.sub and the compiler-generated script run_myscript.sh and the names of the four compute nodes (a006,a007,a008,a009) that ran the four scattered processor cores (four MATLAB labs) that processed the four parallel regions. Unlike a parfor loop, an spmd statement sets variable numlabs to the number of parallel regions (4) in the pool and assigns to each lab running a parallel region a unique value for variable labindex. There are four copies of the parallel job, so there are four lab IDs.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply this method of job submission to a MATLAB function M-file, prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ mkdir test
$ cp mywrapper test
$ cp run_mywrapper.sh test
$ cp myjob.sub test
$ cd test
$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

To scale up this method to handle a real application, increase the wall time to accommodate a longer running parallel loop. Secondly, increase the wall time of mypbsconfig by using the Configuration Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Increase the wall time in the qsub command accordingly. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses acquired. Increase the value of MATLAB_Distrib_Comp_Server in the qsub command command to match the new size of the pool.

The eighth method of job submission uses the MATLAB Compiler mcc to compile a MATLAB M-file with the 'local' configuration and submits the compiled file to a PBS queue.

This method is similar to the seventh method since it uses the same MATLAB M-files and the same job submission file myjob.sub. This method differs from the seventh since it uses the MATLAB 'local' configuration.

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the versions loaded. Set the default parallel configuration to the MATLAB 'local' configuration and compile the MATLAB script M-file. Make a subdirectory and copy two of the files that the compiler made and your job submission file. Make the new subdirectory the current working directory and submit:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2010a   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('local')
>> quit
$ mcc -m myscript.m

To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_myscript.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with four processor cores:

$ qsub -l nodes=1:ppn=4,walltime=00:05:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
135783.rossmann-a myusername standby  myjob.sub   18893   1   4    --  00:05 R 00:00

Job status shows four processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a006.rcac.purdue.edu
run_myscript.sh
rossmann-a006.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
Warning: No display specified.  You will not be able to display graphics on the screen.

Starting matlabpool using the 'local' configuration ... connected to 4 labs.
Lab 1:
  rossmann-a006.rcac.purdue.edu:4:1:1000
  rossmann-a006.rcac.purdue.edu:4:2:1000
  rossmann-a006.rcac.purdue.edu:4:3:1000
  rossmann-a006.rcac.purdue.edu:4:4:1000
Sending a stop signal to all the labs ... stopped.
Did not find any pre-existing parallel jobs created by matlabpool.

Output shows the name of the one compute node (a006) that ran the entire job. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each lab the name of the compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

To apply the eighth method of job submission to a MATLAB function M-file, prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

% FILENAME:  mywrapper.m

result = myfunction();
disp(result)
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mywrapper.sh /apps/rhel5/MATLAB/R2011b

Compile both the wrapper and the function then submit:

$ mcc -m mywrapper.m myfunction.m
$ qsub -l nodes=1:ppn=4,walltime=00:01:00 myjob.sub

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must equal the value in the matlabpool open statement.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

matlabpool open 13;

$ qsub -l nodes=1:ppn=13,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using matlabpool (line 136)
Failed to open matlabpool. (For information in addition to the causing error,
validate the configuration 'local' in the Configurations Manager.)

Error in myscript (line 6)
matlabpool open 13;

Caused by:
    Error using distcomp.interactiveclient/start (line 88)
    Failed to start matlabpool.
    This is caused by:
    You requested a minimum of 13 workers but only 12 workers are allowed with
    the local scheduler.

The ninth method of job submission uses the MATLAB Compiler mcc to compile job submission details with the MATLAB 'torque' scheduler in a script M-file and the code of a parallel job in a function M-file. Since the compilation includes a call to function submit(), the program can be a parallel job. Then the method submits the compiled file to a PBS queue.

Prepare a MATLAB function M-file (function submit() accepts only a function M-file). Use an appropriate filename, here named myfunction.m:

% FILENAME:  myfunction.m

function result = myfunction ()

   if labindex == 1
       % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
       N = labBroadcast(1,int64(1000));
   else
       % Each lab (rank) receives the broadcast value from lab (rank) #1.
       N = labBroadcast(1);
   end

   % Form a string with host name, total number of labs, lab ID, and broadcast value.
   [c name] =system('hostname');
   name = name(1:length(name)-1);
   fmt = num2str(floor(log10(numlabs))+1);
   str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

   % Apply global concatenate to all str's.
   % Store the concatenation of str's in the first dimension (row) and on lab #1.
   result = gcat(str,1,1);

end

Prepare a MATLAB script M-file which finds the MATLAB scheduler 'torque', defines a MATLAB parallel job with four workers and one task, and calls MATLAB function submit(). Use an appropriate filename, here named mypbssubmit.m:

% FILENAME:  mypbssubmit.m

sched = findResource('scheduler','type','torque');
set(sched,'ClusterMatlabRoot',matlabroot);
set(sched,'SubmitArguments','-l walltime=00:01:00,gres=MATLAB_Distrib_Comp_Server+4');
pjob=createParallelJob(sched);
set(pjob,'MinimumNumberOfWorkers',4);
set(pjob,'MaximumNumberOfWorkers',4);
set(pjob,'FileDependencies',{'myfunction.m'});
task = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED  SUBMITTING')

pjob.wait;
results = getAllOutputArguments(pjob);
disp(results{1})
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mypbssubmit.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC Version 4.6.2 is available on rossmann. Verify the versions loaded. Set the default parallel configuration to your PBS configuration and quit MATLAB. Compile the MATLAB script M-file along with the code of the parallel job:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2011b   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('mypbsconfig')
>> quit
$ mcc -m mypbssubmit.m myfunction.m

To obtain the name of the compute node which runs this compiler-generated script run_mypbssubmit.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_mypbssubmit.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with one processor core and request four DCS licenses:

$ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub

This job runs on a compute node myjob.sub which in turn submits the parallel job. The first job must run at least as long as the second job since it collects the results of the parallel job.

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
135783.rossmann-a myusername standby  myjob.sub   18893   1   1    --  00:05 R 00:00
$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
135783.rossmann-a myusername standby  myjob.sub   18893   1   1    --  00:05 R 00:00
135784.rossmann-a myusername standby  Job1        19382   4   4    --  00:01 R 00:00

At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a002.rcac.purdue.edu
run_mypbssubmit.sh
rossmann-a002.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB/R2011b/runtime/glnxa64:/apps/rhel5/MATLAB/R2011b/bin/glnxa64:/apps/rhel5/MATLAB/R2011b/sys/os/glnxa6
4:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64/s
erver:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.
FINISHED  SUBMITTING
rossmann-a002.rcac.purdue.edu:4:1:1000
rossmann-a003.rcac.purdue.edu:4:2:1000
rossmann-a006.rcac.purdue.edu:4:3:1000
rossmann-a007.rcac.purdue.edu:4:4:1000

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (4) defined the number of labs (4) in the pool. Because the MATLAB pool requires N labs, there must be at least N processor cores available on the cluster.

Output shows the name of the compute node (a002) that ran the job submission file myjob.sub and the compiler-generated script run_mypbssubmit.sh and the names of the four compute nodes (a002,a003,a006,a007) that ran the four scattered processor cores (four MATLAB labs) that processed the four copies of the parallel job. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each lab the name of the compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running parallel job. Secondly, increase the wall time of mypbssubmit.m. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of DCS licenses acquired. Increase the value of MATLAB_Distrib_Comp_Server to match the value of MinimumNumberOfWorkers.

The tenth method of job submission uses the MATLAB Compiler mcc to compile job submission details with the MATLAB 'local' scheduler in a script M-file and the code of a parallel job in a function M-file. Since the compilation includes a call to function submit(), the program can be a parallel job. Then the method submits the compiled file to a PBS queue.

Prepare a MATLAB function M-file (function submit() accepts only a function M-file). Use an appropriate filename, here named myfunction.m:

function result = myfunction

    if labindex == 1
        % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
        N = labBroadcast(1,int64(1000));
    else
        % Each lab (rank) receives the broadcast value from lab (rank) #1.
        N = labBroadcast(1);
    end

    % Form a string with host name, total number of labs, lab ID, and broadcast value.
    [c name] =system('hostname');
    name = name(1:length(name)-1);
    fmt = num2str(floor(log10(numlabs))+1);
    str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);

    result = gcat(str,1,1)

end   % function

Prepare a MATLAB script M-file which finds the MATLAB 'local' scheduler, defines a MATLAB parallel job with four workers and one task, and calls MATLAB function submit(). Use an appropriate filename, here named mylclsubmit.m:

% FILENAME:  mylclssubmit.m

!echo "mylclsubmit.m"
!hostname

sched = findResource('scheduler','type','local');
set(sched,'ClusterMatlabRoot',matlabroot);
pjob=createParallelJob(sched);
set(pjob,'MinimumNumberOfWorkers',4);
set(pjob,'MaximumNumberOfWorkers',4);
set(pjob,'FileDependencies',{'myfunction.m'});
T = createTask(pjob,@myfunction,1,{});
submit(pjob);
disp('FINISHED  SUBMITTING')

pjob.wait;
results = getAllOutputArguments(pjob);
disp(results{1})
quit;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

echo "myjob.sub"
hostname

cd $PBS_O_WORKDIR
unset DISPLAY

./run_mylclsubmit.sh /apps/rhel5/MATLAB/R2011b

On a front end, load modules for MATLAB and GCC. The MATLAB R2011b Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Rossmann. Verify the versions loaded. Set the default parallel configuration to the MATLAB 'local' configuration (R2011a added support for compiling PCT code on the 'local' configuration) and quit MATLAB. Compile the MATLAB script M-file along with the code of the parallel job:

$ module load matlab/R2011b
$ module load gcc/4.6.2
$ module list
  1) matlab/R2010a   2) gcc/4.6.2
$ matlab -nodisplay
>> defaultParallelConfig('local')
>> quit
$ mcc -m mylclsubmit.m myfunction.m

To obtain the name of the compute node which runs this compiler-generated script run_mylclsubmit.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`

echo "run_mylclssubmit.sh"
hostname

echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \ args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
    MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
    LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  export LD_LIBRARY_PATH;
  export XAPPLRESDIR;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  "${exe_dir}"/myfunction $*
fi
exit

Submit the job as a single compute node with four processor cores:

$ qsub -l nodes=1:ppn=4,walltime=00:05:00 myjob.sub

View job status:

$ qstat -u myusername

rossmann-adm.rcac.purdue.edu: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
135783.rossmann-a myusername standby  myjob.sub   18893   1   4    --  00:05 R 00:00

Job status shows four processor cores (TSK) on one compute node (NDS).

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
myjob.sub
rossmann-a006.rcac.purdue.edu
run_mylclsubmit.sh
rossmann-a006.rcac.purdue.edu
------------------------------------------
Setting up environment variables
---
LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB/R2011b/runtime/glnxa64:/apps/rhel5/MATLAB/R2011b/bin/glnxa64:/apps/rhel5/MATLAB/R2011b/sys/os/glnxa
64:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64
/server:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB/R2011b/sys/java/jre/glnxa64/jre/lib/amd64
Warning: No display specified.  You will not be able to display graphics on the screen.
FINISHED  SUBMITTING
rossmann-a006.rcac.purdue.edu:4:1:1000
rossmann-a006.rcac.purdue.edu:4:2:1000
rossmann-a006.rcac.purdue.edu:4:3:1000
rossmann-a006.rcac.purdue.edu:4:4:1000

Output shows that the nonnegative scalar integer which is the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers (4) defined the number of labs (4) in the pool. Because the MATLAB pool requires N workers, there must be at least N processor cores available on the compute.

Output shows that one compute node (a006) processed the entire job. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each lab the name of the compute node.

Any output written to standard error will appear in myjob.sub.emyjobid.

To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running parallel job. Also, consider increasing the size of the MATLAB pool, the value of the properties MinimumNumberOfWorkers and MaximumNumberOfWorkers which appear as arguments in the calls of function set(). The maximum possible size of the pool is the number of workers allowed in the 'local' configuration: 8 (R2009a) and 12 (R2011a). Finally, the value of ppn must equal the value of MaximumNumberOfWorkers.

Specifying a MATLAB pool with 13 labs exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

set(pjob,'MinimumNumberOfWorkers',13);
set(pjob,'MaximumNumberOfWorkers',13);

$ qsub -l nodes=1:ppn=13,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub

{Error using distcomp.simpleparalleljob/pSetMinimumNumberOfWorkers (line 59)
MinimumNumberOfWorkers must be the same as or less than MaximumNumberOfWorkers
for a job

Error in mylclsubmit (line 9)


}
distcomp:job:InvalidProperty

For more information about parallel jobs:

Octave (Interpreting an M-file)

GNU Octave is a high-level, interpreted, programming language for numerical computations. The Octave interpreter is the part of Octave which reads M-files, oct-files, and MEX-files and executes Octave statements. Octave is a structured language (similar to C) and mostly compatible with MATLAB. You may use Octave to avoid the need for a MATLAB license, both during development and as a deployed application. By doing so, you may be able to run your application on more systems or more easily distribute it to others.

This section illustrates how to submit a small Octave job to a PBS queue. This Octave example computes the inverse of a matrix.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

The command octave myjob.m (without the redirection) also works in the preceding script.

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave

unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q << EOF
octave << EOF

% Invert matrix A.
A = [1 2 3; 4 5 6; 7 8 0]
inv(A)

quit
EOF     % end of Octave commands

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   2   3
   4   5   6
   7   8   0

ans =

  -1.77778   0.88889  -0.11111
   1.55556  -0.77778   0.22222
  -0.11111   0.22222  -0.11111

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Octave:

Octave Compiler (Compiling an M-file)

Octave does not offer a compiler to translate an M-file into an executable file for additional speed or distribution. You may wish to consider recoding an M-file as either an oct-file or a stand-alone program.

Octave Executable (Oct-file)

An oct-file is an "Octave Executable". It offers a way for Octave code to call functions written in C, C++, or Fortran as though these external functions were built-in Octave functions. You may wish to use an oct-file if you would like to call an existing C, C++, or Fortran function directly from Octave rather than reimplementing that code as an Octave function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than Octave, you may be able to substantially improve performance over Octave source code, especially for statements like for and while.

This section illustrates how to submit a small Octave job with an oct-file to a PBS queue. This Octave example calls a C function which adds two matrices.

Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with an oct-file, which contains the necessary external function interface of Octave. The name of the file is matrixSum.cc:

 * FILENAME:  matrixSum.cc
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum (inMatrix, inMatrix, outMatrix, size)
 *
 * This is an oct-file for Octave.
 *
 **********************************************************/

#include <octave/oct.h>

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
DEFUN_DLD (matrixSum, args, nargout, "matrixSum: A + B") {

    NDArray inMatrix_a;                /* mxn input matrix   */
    NDArray inMatrix_b;                /* mxn input matrix   */
    int nrows_a,ncols_a;               /* size of matrix a   */
    int nrows_b,ncols_b;               /* size of matrix b   */
    NDArray outMatrix_c;               /* mxn output matrix  */

    /* Check for proper number of input arguments */
    if (args.length() != 2) {
       printf("matrixSum:  two inputs required.");
       exit(-1);
    }
    /* Check for proper number of output arguments */
    if (nargout != 1) {
       printf("matrixSum:  one output required.");
       exit(-1);
    }

    /* Check that both input matrices are real matrices. */
    if (!args(0).is_real_matrix()) {
       printf("matrixSum:  expecting LHS (arg 1) to be a real matrix");
       exit(-1);
    }
    if (!args(1).is_real_matrix()) {
       printf("matrixSum:  expecting RHS (arg 2) to be a real matrix");
       exit(-1);
    }

    /* Get dimensions of the first input matrix */
    nrows_a = args(0).rows();
    ncols_a = args(0).columns();
    /* Get dimensions of the second input matrix */
    nrows_b = args(1).rows();
    ncols_b = args(1).columns();

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
       printf("matrixSum:  unequal number of rows.");
       exit(-1);
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
       printf("matrixSum:  unequal number of rows.");
       exit(-1);
    }

    /* Make a pointer to the real data in the first input matrix  */
    inMatrix_a = args(0).array_value();
    /* Make a pointer to the real data in the second input matrix  */
    inMatrix_b = args(1).array_value();

    /* Construct output matrix as a copy of the first input matrix. */
    outMatrix_c = args(0).array_value();

    /* Call the computational routine.  */
    double* ptr_a = inMatrix_a.fortran_vec();
    double* ptr_b = inMatrix_b.fortran_vec();
    double* ptr_c = outMatrix_c.fortran_vec(); 
    matrixSum(ptr_a,ptr_b,ptr_c,nrows_a*ncols_a);

    return octave_value(outMatrix_c);
}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile matrixSum.cc into an oct-file:

$ mkoctfile matrixSum.cc

Two new files appear after the compilation:

matrixSum.o
matrixSum.oct

The name of the Octave-callable oct-file is matrixSum.oct.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Call the separately compiled and dynamically linked oct-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   1   1
   1   1   1

B =

   2   2   2
   2   2   2

C =

   3   3   3
   3   3   3

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave oct-file:

Octave Standalone Program

A stand-alone Octave program is a C, C++, or Fortran program which calls user-written oct-files and the same libraries that Octave uses. A stand-alone program has access to Octave objects, such as the array and matrix classes, as well as all the Octave algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select Octave functions, a stand-alone Octave program may be a good option. This offers the possibility for substantially improved performance over Octave source code, especially for statements like for and while while still allowing use of specialized Octave functions where useful.

This section illustrates how to submit a small, stand-alone Octave program to a PBS queue. This C++ example uses class Matrix and calls an Octave script which prints a message.

Prepare an Octave-compatible M-file with an appropriate filename, here named hello.m:

% FILENAME:  hello.m

disp('hello.m:    hello, world')

Prepare a C++ function file with the necessary external function interface and with an appropriate filename, here named hello.cc:

// FILENAME:  hello.cc

#include <iostream>
#include <octave/oct.h>
#include <octave/octave.h>
#include <octave/parse.h>
#include <octave/toplev.h> /* do_octave_atexit */

int main (const int argc, char ** argv) {

    const char * argvv [] = {"" /* name of program, not relevant */, "--silent"};
    octave_main (2, (char **) argvv, true /* embedded */);

    std::cout << "hello.cc:   hello, world" << std::endl;

    const octave_value_list result = feval ("hello");  /* invoke hello.m */

    int n = 2;
    Matrix a_matrix = Matrix (1,2);
    a_matrix (0,0) = 888;
    a_matrix (0,1) = 999;
    std::cout << "hello.cc:   " << a_matrix;

    do_octave_atexit ();

}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile the stand-alone Octave program:

$ mkoctfile --link-stand-alone hello.cc -o hello

Two new files appear after the compilation:

hello
hello.o

The name of the compiled, stand-alone Octave program is hello.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load gcc
cd $PBS_O_WORKDIR
unset DISPLAY

hello

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello.cc:   hello, world
hello.m:    hello, world
hello.cc:    888 999

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave stand-alone program:

Octave (MEX-file)

MEX stands for "MATLAB Executable". A MEX-file offers a way for MATLAB code to call functions written in C, C++ or Fortran as though these external functions were built-in MATLAB functions. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while.

Octave includes an interface which can link compiled, legacy MEX-files. This interface allows sharing code between Octave and MATLAB users. In Octave, an oct-file will always perform better than a MEX-file, so you should write new code using the oct-file interface, if possible. However, you may test a new MEX-file in Octave then use it in a MATLAB application.

This section illustrates how to submit a small Octave job with a MEX-file to a PBS queue. This Octave example calls a C function which adds two matrices.

Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, int n) {
    int i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. The name of the file is matrixSum.c:

/*************************************************************
 * FILENAME:  matrixSum.c
 *
 * Adds two MxN arrays (inMatrix).
 * Outputs one MxN array (outMatrix).
 *
 * The calling syntax is:
 *
 *      matrixSum(inMatrix, inMatrix, outMatrix, size)
 *
 * This is a MEX-file which Octave will execute.
 *
 **************************************************************/

#include "mex.h"

/* Computational Routine */
void matrixSum (double *a, double *b, double *c, mwSize n) {
    mwSize i;

    /* Component-wise addition. */
    for (i=0; i<n; i++) {
        c[i] = a[i] + b[i];
    }
}

/* Gateway Function */
void mexFunction (int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[]) {

    double *inMatrix_a;               /* mxn input matrix  */
    double *inMatrix_b;               /* mxn input matrix  */
    mwSize nrows_a,ncols_a;           /* size of matrix a  */
    mwSize nrows_b,ncols_b;           /* size of matrix b  */
    double *outMatrix_c;              /* mxn output matrix */

    /* Check for proper number of arguments */
    if(nrhs!=2) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
    }
    if(nlhs!=1) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
    }

    /* Get dimensions of the first input matrix */
    nrows_a = mxGetM(prhs[0]);
    ncols_a = mxGetN(prhs[0]);
    /* Get dimensions of the second input matrix */
    nrows_b = mxGetM(prhs[1]);
    ncols_b = mxGetN(prhs[1]);

    /* Check for equal number of rows. */
    if(nrows_a != nrows_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
    }
    /* Check for equal number of columns. */
    if(ncols_a != ncols_b) {
        mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
    }

    /* Make a pointer to the real data in the first input matrix  */
    inMatrix_a = mxGetPr(prhs[0]);
    /* Make a pointer to the real data in the second input matrix  */
    inMatrix_b = mxGetPr(prhs[1]);

    /* Make the output matrix */
    plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);

    /* Make a pointer to the real data in the output matrix */
    outMatrix_c = mxGetPr(plhs[0]);

    /* Call the computational routine */
    matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
}

To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

$ module load octave

To compile matrixSum.c into a MEX-file:

$ mkoctfile --mex matrixSum.c

Two new files appear after the compilation:

matrixSum.mex
matrixSum.o

The name of the Octave-callable MEX-file is matrixSum.mex.

Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

% FILENAME:  myjob.m

% Call the separately compiled and dynamically linked oct-file.
A = [1,1,1;1,1,1]
B = [2,2,2;2,2,2]
C = matrixSum(A,B)

quit

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load octave
cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -q option to suppress startup messages.
# octave -q < myjob.m
octave < myjob.m

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
A =

   1   1   1
   1   1   1

B =

   2   2   2
   2   2   2

C =

   3   3   3
   3   3   3

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about the Octave-compatible Mex-file:

Perl

Perl is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Perl job to a PBS queue. This Perl example prints a single line of text.

Prepare a Perl input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

print "hello, world\n"

Discover the absolute path of Perl:

$ which perl
/usr/local/bin/perl

There is a second absolute path: /usr/bin/perl.

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

cd $PBS_O_WORKDIR
unset DISPLAY

# Use the -w option to issue warnings.
/usr/bin/perl -w myjob.in

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

For more information about Perl:

Python

Python is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Python job to a PBS queue. This Python example prints a single line of text.

Prepare a Python input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

import string, sys
print "hello, world"

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load python
cd $PBS_O_WORKDIR
unset DISPLAY

python myjob.in

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello, world

Any output written to standard error will appear in myjob.sub.emyjobid.

If you would like to install a python package for your own personal use, you may do so by following these directions. Make sure you have a download link to the software you want to use and substitute it on the wget line.

$ mkdir ~/src
$ cd ~/src
$ wget http://path/to/source/tarball/app-1.0.tar.gz
$ tar xzvf app-1.0.tar.gz
$ cd app-1.0
$ module load python/2.7.2
$ python setup.py install --user
$ cd ~
$ python
>>> import app
>>> quit()

The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

For more information about Python:

R

R, a GNU project, is a language and environment for statistics and graphics. It is an open source version of the S programming language. This section illustrates how to submit a small R job to a PBS queue. This R example computes a Pythagorean triple.

Prepare an R input file with an appropriate filename, here named myjob.in:

# FILENAME:  myjob.in

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load R
cd $PBS_O_WORKDIR

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save < myjob.in

OR:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load R

# --vanilla:
# --no-save: do not save datasets at the end of an R session
R --vanilla --no-save << EOF

# Compute a Pythagorean triple.
a = 3
b = 4
c = sqrt(a*a + b*b)
c     # display result

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.

R version 2.9.0 (2009-04-17)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # FILENAME:  myjob.in
>
> # Compute a Pythagorean triple.
> a = 3
> b = 4
> c = sqrt(a*a + b*b)
> c     # display result
[1] 5
>

Any output written to standard error will appear in myjob.sub.emyjobid.

To install additional R packages, create a folder in your home directory called Rlibs. You will need to be running a recent version of R (2.14.0 or greater as of this writing):

$ mkdir ~/Rlibs

If you are running the bash shell (the default on our clusters), add the following line to your .bashrc (Create the file ~/.bashrc if it doesn't already exist. You may also need to run "ln -s .bashrc .bash_profile" if .bash_profile doesn't exist either):

export R_LIBS=~/Rlibs:$R_LIBS

If you are running csh or tcsh, add the following to your .cshrc:

setenv R_LIBS ~/Rlibs:$R_LIBS

Now run "source .bashrc" and start R:

$ module load R/2.14.0
$ R
> .libPaths()
[1] "/home/myusername/Rlibs"        
[2] "/apps/rhel5/R-2.14.0/lib64/R/library"

.libPaths() should output something similar to above if it is set up correctly. Now let's try installing a package.

> install.packages('packagename',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN")

The above command should download and install the requested R package, which upon completion can then be loaded.

> library('packagename')

If your R package relies on a library that's only installed as a module (for this example we'll use GDAL), you can install it by doing the following:

$ module load gdal
$ module load R
$ R
> install.packages('rgdal',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN", configure.args="--with-gdal-include=$GDAL_HOME/include
--with-gdal-lib=$GDAL_HOME/lib"))

Repeat install.packages(...) for any packages that you need. Your R packages should now be installed.

For more information about R:

SAS

SAS (pronounced "sass") is an integrated system supporting statistical analysis, report generation, business planning, and forecasting. This section illustrates how to submit a small SAS job to a PBS queue. This SAS example displays a small dataset.

Prepare a SAS input file with an appropriate filename, here named myjob.sas:

* FILENAME:  myjob.sas

/* Display a small dataset. */
TITLE 'Display a Small Dataset';
DATA grades;
INPUT name $ midterm final;
DATALINES;
Anne     61 64
Bob      71 71
Carla    86 80
David    79 77
Edwardo  73 73
Fannie   81 81
;
PROC PRINT data=grades;
RUN;

Prepare a job submission file with an appropriate filename, here named myjob.sub:

#!/bin/sh -l
# FILENAME:  myjob.sub

module load sas
cd $PBS_O_WORKDIR

# -stdio:   run SAS in batch mode:
#              read SAS input from stdin
#              write SAS output to stdout
#              write SAS log to stderr
# -nonews:  do not display SAS news
# SAS runs in batch mode when the name of the SAS command file
# appears as a command-line argument.
sas -stdio -nonews myjob

Submit the job:

$ qsub -l nodes=1 myjob.sub

View job status:

$ qstat -u myusername

View results in the file for all standard output, myjob.sub.omyjobid:

                                                           The SAS System                       10:59 Wednesday, January 5, 2011   1

                                                 Obs    name       midterm    final

                                                  1     Anne          61        64
                                                  2     Bob           71        71
                                                  3     Carla         86        80
                                                  4     David         79        77
                                                  5     Edwardo       73        73
                                                  6     Fannie        81        81

View the SAS log in the standard error file, myjob.sub.emyjobid:

1                                                          The SAS System                           12:32 Saturday, January 29, 2011

NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA.
NOTE: SAS (r) Proprietary Software 9.2 (TS2M0)
      Licensed to PURDUE UNIVERSITY - T&R, Site 70063312.
NOTE: This session is executing on the Linux 2.6.18-194.17.1.el5rcac2 (LINUX) platform.



NOTE: SAS initialization used:
      real time           0.70 seconds
      cpu time            0.03 seconds

1          * FILENAME:  myjob.sas
2
3          /* Display a small dataset. */
4          TITLE 'Display a Small Dataset';
5          DATA grades;
6          INPUT name $ midterm final;
7          DATALINES;

NOTE: The data set WORK.GRADES has 6 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.18 seconds
      cpu time            0.01 seconds


14         ;
15         PROC PRINT data=grades;
16         RUN;

NOTE: There were 6 observations read from the data set WORK.GRADES.
NOTE: The PROCEDURE PRINT printed page 1.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.32 seconds
      cpu time            0.04 seconds


NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
      real time           1.28 seconds
      cpu time            0.08 seconds

For more information about SAS:

Running Jobs via HTCondor

HTCondor allows you to run jobs on systems which would otherwise be idle for however long their primary users do not need those systems. HTCondor is one of several distributed computing systems which ITaP makes available. Most ITaP research resources, in addition to being available through normal means, are a part of BoilerGrid and are accessible via HTCondor. If a primary user needs a processor core on a compute node, HTCondor immediately either checkpoints and/or migrates all HTCondor jobs on that compute node and makes that resource available to the primary user. Thus, shorter jobs will have a better completion rate via HTCondor than longer jobs; however, even though HTCondor may have to restart jobs elsewhere, BoilerGrid can offer a vast amount of computational resources to users. Not only are nearly all ITaP research systems part of BoilerGrid, so also are large numbers of lab machines at the West Lafayette and other Purdue campuses. BoilerGrid is one of the largest HTCondor pools in the world. Some machines at other institutions are also a part of a larger HTCondor federation known as DiaGrid and are available as well.

For more information:

Rossmann Frequently Asked Questions (FAQ)

There are currently no FAQs for Rossmann.