Carter - User Guide

Expand All
  • 1  Conventions Used in this Document

    1  Conventions Used in this Document

    This document follows certain typesetting and naming conventions:

    • Colored, underlined text indicates a link.
    • Colored, bold text highlights something of particular importance.
    • Italicized text notes the first use of a key concept or term.
    • Bold, fixed-width font text indicates a command or command argument that you type verbatim.
    • Examples of commands and output as you would see them on the command line will appear in colored blocks of fixed-width text such as this:
      $ example
      This is an example of commands and output.
      
    • All command line shell prompts appear as a single dollar sign ("$"). Your actual shell prompt may differ.
    • All examples work with bash or ksh shells. Where different, changes needed for tcsh or csh shell users appear in example comments.
    • All names that begin with "my" illustrate examples that you replace with an appropriate name. These include "myusername", "myfilename", "mydirectory", "myjobid", etc.
    • The term "processor core" or "core" throughout this guide refers to the individual CPU cores on a processor chip.
  • 2  Overview of Carter

    2  Overview of Carter

    Carter was launched through an ITaP partnership with Intel in November 2011 and is a member of Purdue's Community Cluster Program. Carter primarily consists of HP compute nodes with two 8-core Intel Xeon-E5 processors (16 cores per node) and between 32 GB and 256 GB of memory. A few NVIDIA GPU-accelerated nodes are also available. All nodes have 56 Gbps FDR Infiniband connections and a 5-year warranty. Carter is planned to be decommissioned on April 30, 2017.

    To purchase access to Carter today, go to the Carter Access Purchase page. Please subscribe to our Community Cluster Program Mailing List to stay informed on the latest purchasing developments or contact us via email at rcac-cluster-purchase@lists.purdue.edu if you have any questions.

    • 2.1  Namesake

      2.1  Namesake

      Carter is named in honor of Dennis Lee Carter. More information about his life and impact on Purdue is available in an ITaP Biography of Dennis Lee Carter.

    • 2.2  Detailed Hardware Specification

      2.2  Detailed Hardware Specification

      Most Carter nodes consist of identical hardware. All Carter nodes have 16 processor cores, between 32 GB and 256 GB RAM, and 56 Gbps Infiniband interconnects. Carter G nodes are also each equipped with three NVIDIA Tesla GPUs that may be used to further accelerate work tailored to these GPUs.

      Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect TeraFLOPS
      Carter-A 556 Two 8-Core Intel Xeon-E5 16 32 GB 56 Gbps FDR Infiniband 165.6
      Carter-B 80 Two 8-Core Intel Xeon-E5 16 64 GB 56 Gbps FDR Infiniband 20.1
      Carter-C 12 Two 8-Core Intel Xeon-E5 16 256 GB 56 Gbps FDR Infiniband 0.6
      Carter-G 12 Two 8-Core Intel Xeon-E5 + Three NVIDIA Tesla M2090 GPUs 16 128 GB 56 Gbps FDR Infiniband n/a

      Carter nodes run Red Hat Enterprise Linux 6 (RHEL6) and use Moab Workload Manager 7 and TORQUE Resource Manager 4 as the portable batch system (PBS) for resource and job management. The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).

      For more information about the TORQUE Resource Manager:

      On Carter, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

      • Intel 13.1.1.163
      • MKL
      • OpenMPI 1.6.3

      To load the recommended set:

      $ module load devel
      

      To verify what you loaded:

      $ module list
      
    • 2.3  Node Interconnect Systems

      2.3  Node Interconnect Systems

      The system interconnect is the networking technology that connects nodes of a cluster to each other. This is often much faster and sometimes radically different from the networking available between a resource and other machines or the outside world. Interconnects have different characteristics that may affect parallel message-passing programs and their design. Each ITaP research resource has different interconnect options available, and some have more than one available to all or only portions of the resource's nodes. For information on which interconnects are available, refer to the hardware specification for the resource above. Details about the specific interconnects available on Carter follow.

      • 2.3.1  FDR InfiniBand

        2.3.1  FDR InfiniBand

        FDR InfiniBand is the latest generation of the Infiniband switched fabric communications link primarily used in high-performance computing, and boasts speeds of 14 Gb/s per lane. As with previous versions of InfiniBand, its design is scalable, and its architecture specification defines a connection between compute nodes and high-performance I/O nodes.

        After running a job, InfiniBand may take some time to shut down, but it will communicate very quickly while computing. It is thus very good for situations where there is much data communication but with few initiations of the connections between devices. It is possible to end it quickly with control-C.

        InfiniBand was the result of the merger of two competing designs: Future I/O (developed by Compaq, IBM, and Hewlett-Packard) and Next Generation I/O (developed by Intel, Microsoft, and Sun).

  • 3  Accounts on Carter

    3  Accounts on Carter

    • 3.1  Purchasing Nodes

      3.1  Purchasing Nodes

      Information Technology at Purdue (ITaP) operates a significant shared cluster computing infrastructure developed over several years through focused acquisitions using funds from grants, faculty startup packages, and institutional sources. These "community clusters" are now at the foundation of Purdue's research cyberinfrastructure.

      We strongly encourage any Purdue faculty or staff with computational needs to join this growing community and enjoy the enormous benefits this shared infrastructure provides:

      • Peace of Mind
        ITaP system administrators take care of security patches, attempted hacks, operating system upgrades, and hardware repair so faculty and graduate students can concentrate on research.
      • Low Overhead
        ITaP data centers provide infrastructure such as networking, racks, floor space, cooling, and power.
      • Cost Effective
        ITaP works with vendors to obtain the best price for computing resources by pooling funds from different disciplines to leverage greater group purchasing power.

      Through the Community Cluster Program, Purdue affiliates have invested several million dollars in computational and storage resources from Q4 2006 to the present with great success in both the research accomplished and the money saved on equipment purchases.

      For more information or to purchase access to our latest cluster today, see the Access Purchase page. To get updates on ITaP's community cluster program, please subscribe to the Community Cluster Program Mailing List.

    • 3.2  Cluster Partner Services

      3.2  Cluster Partner Services

      In addition to priority access to a number of processor cores, partners in our Community Cluster Program may also take advantage of additional services offered to them free of charge. These include:

      • Unix Group
        Restrict access to files or programs by using Unix file permissions on the basis of those you approve for access to your queues.
      • Application Storage
        Store your custom application binaries in central storage that is backed-up and available from all clusters, but not part of your personal home directory.
      • Subversion (SVN) Repository
        Store and manage your code or documents through a centrally-supported, professional-grade, revision control system.

      To request any of these be created for your research group, or for more information, please email rcac-help@purdue.edu.

    • 3.3  Obtaining an Account

      3.3  Obtaining an Account

      To obtain an account, you must be part of a research group which has purchased access to Carter. Refer to the Accounts / Access page for more details on how to request access.

    • 3.4  Login / SSH

      3.4  Login / SSH

      To submit jobs on Carter, log in to the submission host carter.rcac.purdue.edu via SSH. This submission host is actually 4 front-end hosts: carter-fe00 , carter-fe01 , carter-fe02 , and carter-fe03. The login process randomly assigns one of these front-ends to each login to carter.rcac.purdue.edu. While all of these front-end hosts are identical, each has its own /tmp. Sharing data in /tmp during subsequent sessions may fail. ITaP advises using scratch space for multisession, shared data instead.

      • 3.4.1  SSH Client Software

        3.4.1  SSH Client Software

        Secure Shell or SSH is a way of establishing a secure (encrypted) connection between two computers. It uses public-key cryptography to authenticate the remote computer and (optionally) to allow the remote computer to authenticate the user. Its usual function involves logging in to a remote machine and executing commands, but it also supports tunneling and forwarding of X11 or arbitrary TCP connections. There are many SSH clients available for all operating systems.

        Linux / Solaris / AIX / HP-UX / Unix:

        • The ssh command is pre-installed. Log in using ssh myusername@servername.

        Microsoft Windows:

        • PuTTY is an extremely small download of a free, full-featured SSH client.
        • Secure CRT is a commercial SSH client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

        Mac OS X:

        • The ssh command is pre-installed. You may start a local terminal window from "Applications->Utilities". Log in using ssh myusername@servername.
      • 3.4.2  SSH Keys

        3.4.2  SSH Keys

        SSH works with many different means of authentication. One popular authentication method is Public Key Authentication (PKA). PKA is a method of establishing your identity to a remote computer using related sets of encryption data called keys. PKA is a more secure alternative to traditional password-based authentication with which you are probably familiar.

        To employ PKA via SSH, you manually generate a keypair (also called SSH keys) in the location from where you wish to initiate a connection to a remote machine. This keypair consists of two text files: private key and public key. You keep the private key file confidential on your local machine or local home directory (hence the name "private" key). You then log in to a remote machine (if possible) and append the corresponding public key text to the end of a specific file, or have a system administrator do so on your behalf. In future login attempts, PKA compares the public and private keys to verify your identity; only then do you have access to the remote machine.

        As a user, you can create, maintain, and employ as many keypairs as you wish. If you connect to a computational resource from your work laptop, your work desktop, and your home desktop, you can create and employ keypairs on each. You can also create multiple keypairs on a single local machine to serve different purposes, such as establishing access to different remote machines or establishing different types of access to a single remote machine. In short, PKA via SSH offers a secure but flexible means of identifying yourself to all kinds of computational resources.

        Passphrases and SSH Keys

        Creating a keypair prompts you to provide a passphrase for the private key. This passphrase is different from a password in a number of ways. First, a passphrase is, as the name implies, a phrase. It can include most types of characters, including spaces, and has no limits on length. Secondly, the remote machine does not receive this passphrase for verification. Its purpose is only to allow the use of your local private key and is specific to a specific local private key.

        Perhaps you are wondering why you would need a private key passphrase at all when using PKA. If the private key remains secure, why the need for a passphrase just to use it? Indeed, if the location of your private keys were always completely secure, a passphrase might not be necessary. In reality, a number of situations could arise in which someone may improperly gain access to your private key files. In these situations, a passphrase offers another level of security for you, the user who created the keypair.

        Think of the private key/passphrase combination as being analogous to your ATM card/PIN combination. The ATM card itself is the object that grants access to your important accounts, and as such, should remain secure at all times—just as a private key should. But if you ever lose your wallet or someone steals your ATM card, you are glad that your PIN exists to offer another level of protection. The same is true for a private key passphrase.

        When you create a keypair, you should always provide a corresponding private key passphrase. For security purposes, avoid using phrases which automated programs can discover (e.g. phrases that consist solely of words in English-language dictionaries). This passphrase is not recoverable if forgotten, so make note of it. Only a few situations warrant using a non-passphrase-protected private key—conducting automated file backups is one such situation. If you need to use a non-passphrase-protected private key to conduct automated backups to Fortress, see the No-Passphrase SSH Keys section.

      • 3.4.3  SSH X11 Forwarding

        3.4.3  SSH X11 Forwarding

        SSH supports tunneling of X11 (X-Windows). If you have an X11 server running on your local machine, you may use X11 applications on remote systems and have their graphical displays appear on your local machine. These X11 connections are tunneled and encrypted automatically by your SSH client.

        Installing an X11 Server

        To use X11, you will need to have a local X11 server running on your personal machine. Both free and commercial X11 servers are available for various operating systems.

        Linux / Solaris / AIX / HP-UX / Unix:

        • An X11 server is at the core of all graphical sessions. If you are logged in to a graphical environment on these operating systems, you are already running an X11 server.
        • ThinLinc is an alternative to running an X11 server directly on your Linux computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.

        Microsoft Windows:

        • ThinLinc is an alternative to running an X11 server directly on your Windows computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.
        • Xming is a free X11 server available for all versions of Windows, although it may occasionally hang and require a restart. Download the "Public Domain Xming" or donate to the development for the newest version.
        • Hummingbird eXceed is a commercial X11 server available for all versions of Windows.
        • Cygwin is another free X11 server available for all versions of Windows. Download and run setup.exe. During installation, you must select the following packages which are not included by default:
          • X-startup-scripts
          • XFree86-lib-compat
          • xorg-*
          • xterm
          • xwinwm
          • lib-glitz-glx1
          • opengl (if you also want OpenGL support, under the Graphics group)
          Once you are running the Cygwin X server, start an xterm, type XWin -multiwindow in it, and then press enter. You may now run your SSH client.

        Mac OS X:

        • X11 is available as an optional install on the Mac OS X install disks prior to 10.7/Lion. Run the installer, select the X11 option, and follow the instructions. For 10.7+ please download XQuartz.
        • ThinLinc is an alternative to running an X11 server directly on your Mac computer. ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. See the ThinLinc section for information on using it.

        Enabling X11 Forwarding in your SSH Client

        Once you are running an X11 server, you will need to enable X11 forwarding/tunneling in your SSH client:

        • "ssh": X11 tunneling should be enabled by default. To be certain it is enabled, you may use ssh -Y.
        • PuTTY: Prior to connection, in your connection's options, under "X11", check "Enable X11 forwarding", and save your connection.
        • Secure CRT: Right-click a saved connection, and select "Properties". Expand the "Connection" settings, then go to "Port Forwarding" -> "Remote/X11". Check "Forward X11 packets" and click "OK".

        SSH will set the remote environment variable $DISPLAY to "localhost:XX.YY" when this is working correctly. If you had previously set your $DISPLAY environment variable to your local IP or hostname, you must remove any set/export/setenv of this variable from your login scripts. The environment variable $DISPLAY must be left as SSH sets it, which is to a random local port address. Setting $DISPLAY to an IP or hostname will not work.

      • 3.4.4  Thinlinc Remote Desktop

        3.4.4  Thinlinc Remote Desktop

        ITaP Research Computing provides ThinLinc as an alternative to running an X11 server directly on your computer. It allows you to run graphical applications or graphical interacitve jobs directly on Carter through a persisent remote graphical desktop session.

        ThinLinc is a service that allows you to connect to a persistent remote graphical desktop session. This service works very well over a high latency, low bandwidth, or off-campus connection compared to running an X11 server locally. It is also very helpful for Windows users who do not have an easy to use local X11 server, as little to no set up is required on your computer.

        There are two ways in which to use ThinLinc: preferably through the native client or through a web browser.

        Installing the ThinLinc native client

        The native ThinLinc client will offer the best experience especially over off-campus connections and is the recommended method for using ThinLinc. It is compatible with Windows, Mac OS X, and Linux.

        • Download the ThinLinc client from the ThinLinc website.
        • Start the ThinLinc client on your computer.
        • In the client's login window, use thinlinc.rcac.purdue.edu as the Server. Use your Purdue Career Account username and password.
        • Click the Connect button.
        • Continue to following section on connecting to Carter from ThinLinc.

        Using ThinLinc through your web browser

        The ThinLinc service can be accessed from your web browser as a convenience to installing the native client. This option works with no set up and is a good option for those on computers where you do not have privileges to install software. All that is required is an up-to-date web browser. Older versions of Internet Explorer may not work.

        • Open a web browser and navigate to thinlinc.rcac.purdue.edu.
        • Log in with your Purdue Career Account username and password.
        • You may safely proceed past any warning messages from your browser.
        • Continue to following section on connecting to from ThinLinc.
          • Connecting to Carter from ThinLinc

            • Once logged in, you will be presented with a remote Linux desktop.
            • Open the terminal application on the remote desktop.
            • Log in to the submission host carter.rcac.purdue.edu with X forwarding enabled using the following command:
              $ ssh -Y carter.rcac.purdue.edu 
            • Once logged in to the Carter head node, you may use graphical editors, debuggers, software like Matlab, or run graphical interactive jobs. For example, to test the X forwarding connection issue the following command to launch the graphical editor gedit:
              $ gedit
            • This session will remain persistent even if you disconnect from the session. Any interactive jobs or applications you left running will continue running even if you are not connected to the session.

            Tips for using ThinLinc native client

            • To exit a full screen ThinLinc session press the F8 key on your keyboard (fn + F8 key for Mac users) and click to disconnect or exit full screen.
            • Full screen mode can be disabled when connecting to a session by clicking the Options button and disabling full screen mode from the Screen tab.
    • 3.5  Passwords

      3.5  Passwords

      If you have received a default password as part of the process of obtaining your account, you should change it before you log onto Carter for the first time. Change your password from the SecurePurdue website. You will have the same password on all ITaP systems such as Carter, Purdue email, or Blackboard.

      Passwords may need to be changed periodically in accordance with Purdue security policies. Passwords must follow certain guidelines as described on the SecurePurdue webpage and ITaP recommends following some guidelines to select a strong password.

      ITaP staff will NEVER ask for your password, by email or otherwise.

      Never share your password with another user or make your password known to anyone else.

    • 3.6  Email

      3.6  Email

      There is no local email delivery available on Carter. Carter forwards all email which it receives to your career account email address.

    • 3.7  Login Shell

      3.7  Login Shell

      Your shell is the program that generates your command-line prompt and processes commands. On ITaP research systems, several common shell choices are available:

      Name Description Path
      bash A Bourne-shell (sh) compatible shell with many newer advanced features as well. Bash is the default shell for new ITaP research system accounts. This is the most common shell in use on ITaP research systems. /bin/bash
      tcsh An advanced variant on csh with all the features of modern shells. Tcsh is the second most popular shell in use today. /bin/tcsh
      zsh An advanced shell which incorprates all the functionality of bash and tcsh combined, usually with identical syntax. /bin/zsh

      To find out what shell you are running right now, simply use the ps command:

      $ ps
        PID TTY          TIME CMD
      30181 pts/27   00:00:00 bash
      30273 pts/27   00:00:00 ps
      

      To use a different shell on a one-time or trial basis, simply type the shell name as a command. To return to your original shell, type exit:

      $ ps
        PID TTY          TIME CMD
      30181 pts/27   00:00:00 bash
      30273 pts/27   00:00:00 ps
      
      $ tcsh
      % ps
        PID TTY          TIME CMD
      30181 pts/27   00:00:00 bash
      30313 pts/27   00:00:00 tcsh
      30315 pts/27   00:00:00 ps
      
      % exit
      $
      

      To permanently change your default login shell, use the secure web form provided to change shells.

      There is a propagation delay which may last up to two hours before this change will take effect. Once propagated you will need to log out and log back in to start in your new shell.

  • 4  File Storage and Transfer for Carter

    4  File Storage and Transfer for Carter

    • 4.1  Storage Options

      4.1  Storage Options

      File storage options on ITaP research systems include long-term storage (home directories, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. ITaP provides daily snapshots of home directories for a limited time for accidental deletion recovery. ITaP does not back up scratch directories or temporary storage and regularly purges old files from scratch and /tmp directories. More details about each storage option appear below.

      • 4.1.1  Home Directory

        4.1.1  Home Directory

        ITaP provides home directories for long-term file storage. Each user has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

        ITaP provides daily snapshots of your home directory for a limited period of time in the event of accidental deletion. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive.

        Your home directory physically resides within the Isilon storage system at Purdue. To find the path to your home directory, first log in then immediately enter the following:

        $ pwd
        /home/myusername
        

        Or from any subdirectory:

        $ echo $HOME
        /home/myusername
        

        Your home directory and its contents are available on all ITaP research computing machines, including front-end hosts and compute nodes.

        Your home directory has a quota limiting the total size of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

        • 4.1.1.1  Lost Home Directory File Recovery

          4.1.1.1  Lost Home Directory File Recovery

          Only files which have been snap-shotted overnight are recoverable. If you lose a file the same day you created it, it is NOT recoverable.

          To recover files lost from your home directory, use the flost command:

          $ flost
          
      • 4.1.2  Scratch Space

        4.1.2  Scratch Space

        ITaP provides scratch directories for short-term file storage only. The quota of your scratch directory is much greater than the quota of your home directory. You should use your scratch directory for storing temporary input files which your job reads or for writing temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results. The hsi and htar commands provide easy-to-use interfaces into the archive and can be used to copy files into the archive interactively or even automatically at the end of your regular job submission scripts.

        Files in scratch directories are not recoverable. ITaP does not back up files in scratch directories. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

        ITaP purges files from scratch directories not accessed or had content modified in 90 days. Owners of these files receive a notice one week before removal via email. Be sure to regularly check your Purdue email account or set up mail forwarding to an email account you do regularly check. For more information, please refer to our Scratch File Purging Policy.

        All users may access scratch directories on Carter. To find the path to your scratch directory:

        $ findscratch
        /scratch/carter/m/myusername
        

        The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

        $ echo $RCAC_SCRATCH
        /scratch/carter/m/myusername
        

        All scratch directories are available on each front-end of all computational resources, however, only the /scratch/carter directory is available on Carter compute nodes. No other scratch directories are available on Carter compute nodes.

        To find the path to someone else's scratch directory:

        $ findscratch someusername
        /scratch/carter/s/someusername
        

        Your scratch directory has a quota capping the total size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits .

      • 4.1.3  /tmp Directory

        4.1.3  /tmp Directory

        ITaP provides /tmp directories for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

        ITaP does not perform backups for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

      • 4.1.4  Long-Term Storage

        4.1.4  Long-Term Storage

        Long-term Storage or Permanent Storage is available to ITaP research users on the High Performance Storage System (HPSS), an archival storage system, called Fortress. HPSS is a software package that manages a hierarchical storage system. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has over 10PB of capacity.

        Files smaller than 100 MB have their primary copy stored on low-cost disks (disk cache), but the second copy (backup of disk cache) is on tape or optical disks. This provides a rapid restore time to the disk cache. However, the large latency to access a larger file (usually involving a copy from a tape cartridge) makes it unsuitable for direct use by any processes or jobs, even where possible. The primary and secondary copies of larger files are stored on separate tape cartridges in the tape library.

        To ensure optimal performance for all users, and to keep the Fortress system healthy, please remember the following tips:

        • Fortress operates most effectively with large files - 1GB or larger. If your data is comprised of smaller files, use HTAR to directly create archives in Fortress.
        • When working with files on cluster head nodes, use your home directory or a scratch file system, rather than editing or computing on files directly in Fortress. Copy any data you wish to archive to Fortress after computation is complete.

        Fortress writes two copies of every file either to two tapes, or to disk and a tape, to protect against medium errors. Unfortunately, Fortress does not automatically switch to the alternate copy when it has trouble accessing the primary. If it seems to be taking an extraordinary amount of time to retrieve a file (hours), please either email rcac-help@purdue.edu. We can then investigate why it is taking so long. If it is an error on the primary copy, we will instruct Fortress to switch to the alternate copy as the primary and recreate a new alternate copy.

        For more information about Fortress, how it works, user guides, and how to obtain an account:

        • 4.1.4.1  Manual File Transfer to Long-Term Storage

          4.1.4.1  Manual File Transfer to Long-Term Storage

          There are a variety of ways to manually transfer files to your Fortress home directory for long-term storage.

          • 4.1.4.1.1  HSI

            4.1.4.1.1  HSI

            HSI, the Hierarchical Storage Interface, is the preferred method of transferring files to and from Carter. HSI is designed to be a friendly interface for users of the High Performance Storage System (HPSS). It provides a familiar Unix-style environment for working within HPSS while automatically taking advantage of high-speed, parallel file transfers without requiring any special user knowledge.

            HSI is provided on all ITaP research systems as the command hsi. HSI is also available for Download for many operating systems.

            Interactive usage:

            $ hsi
            
            *************************************************************************
            *                    Purdue University
            *                  High Performance Storage System (HPSS)
            *************************************************************************
            * This is the Purdue Data Archive, Fortress.  For further information
            * see http://www.rcac.purdue.edu/storage/fortress/
            *
            *   If you are having problems with HPSS, please call IT/Operational
            *   Services at 49-44000 or send E-mail to rcac-help@purdue.edu.
            *
            *************************************************************************
            Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011]
            
            [Fortress HSI]/home/myusername->put data1.fits
            put  'test' : '/home/myusername/test' ( 1024000000 bytes, 250138.1 KBS (cos=11))
            
            [Fortress HSI]/home/myusername->lcd /tmp
            
            [Fortress HSI]/home/myusername->get data1.fits
            get  '/tmp/data1.fits' : '/home/myusername/data1.fits' (2011/10/04 16:28:50 1024000000 bytes, 325844.9 KBS )
            
            [Fortress HSI]/home/myusername->quit
            

            Batch transfer file:

            put data1.fits
            put data2.fits
            put data3.fits
            put data4.fits
            put data5.fits
            put data6.fits
            put data7.fits
            put data8.fits
            put data9.fits
            

            Batch usage:

            $ hsi < my_batch_transfer_file
            *************************************************************************
            *                    Purdue University
            *                  High Performance Storage System (HPSS)
            *************************************************************************
            * This is the Purdue Data Archive, Fortress.  For further information
            * see http://www.rcac.purdue.edu/storage/fortress/
            *
            *   If you are having problems with HPSS, please call IT/Operational
            *   Services at 49-44000 or send E-mail to rcac-help@purdue.edu.
            *
            *************************************************************************
            Username: myusername  UID: 12345  Acct: 12345(12345) Copies: 1 Firewall: off [hsi.3.5.8 Wed Sep 21 17:31:14 EDT 2011]
            put  'data1.fits' : '/home/myusername/data1.fits' ( 1024000000 bytes, 250200.7 KBS (cos=11))
            put  'data2.fits' : '/home/myusername/data2.fits' ( 1024000000 bytes, 258893.4 KBS (cos=11))
            put  'data3.fits' : '/home/myusername/data3.fits' ( 1024000000 bytes, 222819.7 KBS (cos=11))
            put  'data4.fits' : '/home/myusername/data4.fits' ( 1024000000 bytes, 224311.9 KBS (cos=11))
            put  'data5.fits' : '/home/myusername/data5.fits' ( 1024000000 bytes, 323707.3 KBS (cos=11))
            put  'data6.fits' : '/home/myusername/data6.fits' ( 1024000000 bytes, 320322.9 KBS (cos=11))
            put  'data7.fits' : '/home/myusername/data7.fits' ( 1024000000 bytes, 253192.6 KBS (cos=11))
            put  'data8.fits' : '/home/myusername/data8.fits' ( 1024000000 bytes, 253056.2 KBS (cos=11))
            put  'data9.fits' : '/home/myusername/data9.fits' ( 1024000000 bytes, 323218.9 KBS (cos=11))
            EOF detected on TTY - ending HSI session
            

            For more information about HSI:

          • 4.1.4.1.2  HTAR

            4.1.4.1.2  HTAR

            HTAR (short for "HPSS TAR") is a utility program that writes TAR-compatible archive files directly onto Carter, without having to first create a local file. Its command line was originally based on the AIX tar program, with a number of extensions added to provide extra features.

            HTAR is provided on all ITaP research systems as the command htar. HTAR is also available for Download for many operating systems.

            Usage:

              (Create a tar archive on Carter named data.tar including all files with the extension ".fits".)
            $ htar -cvf data.tar *.fits
            HTAR: a   data1.fits
            HTAR: a   data2.fits
            HTAR: a   data3.fits
            HTAR: a   data4.fits
            HTAR: a   data5.fits
            HTAR: a   data6.fits
            HTAR: a   data7.fits
            HTAR: a   data8.fits
            HTAR: a   data9.fits
            HTAR: a   /tmp/HTAR_CF_CHK_17953_1317760775
            HTAR Create complete for data.tar. 9,216,006,144 bytes written for 9 member files, max threads: 3 Transfer time: 29.622 seconds (311.121 MB/s)
            HTAR: HTAR SUCCESSFUL
            
              (Unpack a tar archive on Carter named data.tar into a scratch directory for use in a batch job.)
            $ cd $RCAC_SCRATCH/job_dir
            $ htar -xvf data.tar
            HTAR: x data1.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data2.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data3.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data4.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data5.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data6.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data8.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: x data9.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: Extract complete for data.tar, 9 files. total bytes read: 9,216,004,608 in 33.914 seconds (271.749 MB/s )
            HTAR: HTAR SUCCESSFUL
            
              (Look at the contents of the data.tar HTAR archive on Carter.)
            $ htar -tvf data.tar
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:30  data1.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data2.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data3.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data4.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data5.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data6.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data7.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data8.fits
            HTAR: -rw-r--r--  myusername/pucc 1024000000 2011-10-04 16:35  data9.fits
            HTAR: -rw-------  myusername/pucc        256 2011-10-04 16:39  /tmp/HTAR_CF_CHK_17953_1317760775
            HTAR: Listing complete for data.tar, 10 files 10 total objects
            HTAR: HTAR SUCCESSFUL
            
              (Unpack a single file, "data7.fits", from the tar archive on Carter named data.tar into a scratch directory.)
            $ htar -xvf data.tar data7.fits
            HTAR: x data7.fits, 1024000000 bytes, 2000001 media blocks
            HTAR: Extract complete for data.tar, 1 files. total bytes read: 1,024,000,512 in 3.642 seconds (281.166 MB/s )
            HTAR: HTAR SUCCESSFUL
            

            For more information about HTAR:

          • 4.1.4.1.3  SCP

            4.1.4.1.3  SCP

            Fortress does NOT support SCP.

          • 4.1.4.1.4  SFTP

            4.1.4.1.4  SFTP

            Fortress does NOT support SFTP.

    • 4.2  Environment Variables

      4.2  Environment Variables

      Several environment variables are automatically defined for you to help you manage your storage. Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:

      Name Description
      HOME path to your home directory
      PWD path to your current directory
      RCAC_SCRATCH path to scratch filesystem

      By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

      $ ls $HOME
      ...
      
      $ ls $RCAC_SCRATCH/myproject
      ...
      

      To find the value of any environment variable:

      $ echo $RCAC_SCRATCH
      /scratch/carter/m/myusername
      

      To list the values of all environment variables:

      $ env
      USER=myusername
      HOME=/home/myusername
      RCAC_SCRATCH=/scratch/carter/m/myusername
      ...
      

      You may create or overwrite an environment variable. To pass (export) the value of a variable in either bash or ksh:

      $ export MYPROJECT=$RCAC_SCRATCH/myproject
      

      To assign a value to an environment variable in either tcsh or csh:

      $ setenv MYPROJECT value
      
    • 4.3  Storage Quotas / Limits

      4.3  Storage Quotas / Limits

      ITaP imposes some limits on your disk usage on research systems. ITaP implements a quota on each filesystem. Each filesystem (home directory, scratch directory, etc.) may have a different limit. If you exceed the quota, you will not be able to save new files or new data to the filesystem until you delete or move data to long-term storage.

      • 4.3.1  Checking Quota Usage

        4.3.1  Checking Quota Usage

        To check the current quotas of your home and scratch directories use the myquota command:

        $ myquota
        Type        Filesystem          Size    Limit  Use         Files    Limit  Use
        ==============================================================================
        home        extensible         5.0GB   10.0GB  50%             -        -   -
        scratch     /scratch/carter/    8KB  476.8GB   0%             2  100,000   0%
        

        The columns are as follows:

        1. Type: indicates home or scratch directory.
        2. Filesystem: name of storage option.
        3. Size: sum of file sizes in bytes.
        4. Limit: allowed maximum on sum of file sizes in bytes.
        5. Use: percentage of file-size limit currently in use.
        6. Files: number of files and directories (not the size).
        7. Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
        8. Use: percentage of file-number limit currently in use.

        If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

        To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

        $ du -h --max-depth=1 $HOME >myfile
        32K /home/myusername/mysubdirectory_1
        529M    /home/myusername/mysubdirectory_2
        608K    /home/myusername/mysubdirectory_3
        

        The second directory is the largest of the three, so apply command du to it.

        To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

        $ du -h --max-depth=1 $RCAC_SCRATCH >myfile
        160K    /scratch/carter/m/myusername
        

        This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to long-term storage to free space in your home and scratch directories.

      • 4.3.2  Increasing Your Storage Quota

        4.3.2  Increasing Your Storage Quota

        Home Directory

        If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may go to the BoilerBackpack Quota Management site and use the sliders there to increase the amount of space allocated to your research home directory vs. other storage options, up to a maximum of 100GB.

        Scratch Space

        If you find you need additional disk space in your scratch space, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may ask for a quota increase at rcac-help@purdue.edu. Quota requests up to 2TB and 200,000 files on LustreA or LustreC can be processed quickly.

    • 4.4  Archive and Compression

      4.4  Archive and Compression

      There are several options for archiving and compressing groups of files or directories on ITaP research systems. The mostly commonly used options are:

      • tar   (more information)
        Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.
        Examples:
          (list contents of archive somefile.tar)
        $ tar tvf somefile.tar
        
          (extract contents of somefile.tar)
        $ tar xvf somefile.tar
        
          (extract contents of gzipped archive somefile.tar.gz)
        $ tar xzvf somefile.tar.gz
        
          (extract contents of bzip2 archive somefile.tar.bz2)
        $ tar xjvf somefile.tar.bz2
        
          (archive all ".c" files in current directory into one archive file)
        $ tar cvf somefile.tar *.c
        
          (archive and gzip-compress all files in a directory into one archive file)
        $ tar czvf somefile.tar.gz somedirectory/
        
          (archive and bzip2-compress all files in a directory into one archive file)
        $ tar cjvf somefile.tar.bz2 somedirectory/
        
        
        Other arguments for tar can be explored by using the man tar command.
      • gzip   (more information)
        The standard compression system for all GNU software.
        Examples:
          (compress file somefile - also removes uncompressed file)
        $ gzip somefile
        
          (uncompress file somefile.gz - also removes compressed file)
        $ gunzip somefile.gz
        
      • bzip2   (more information)
        Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.
        Examples:
          (compress file somefile - also removes uncompressed file)
        $ bzip2 somefile
        
          (uncompress file somefile.bz2 - also removes compressed file)
        $ bunzip2 somefile.bz2
        

      There are several other, less commonly used, options available as well:

      • zip
      • 7zip
      • xz

    • 4.5  File Transfer

      4.5  File Transfer

      There are a variety of ways to transfer data to and from ITaP research systems. Which you should use depends on several factors, including the ease of use for you personally, connection speed and bandwidth, and the size and number of files which you intend to transfer.

      • 4.5.1  FTP

        4.5.1  FTP

        ITaP does not support FTP on any ITaP research systems because it does not allow for secure transmission of data. Try using one of the other methods described below instead of FTP.

      • 4.5.2  SCP

        4.5.2  SCP

        SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

        Command-line usage:

          (to a remote system from local)
        $ scp sourcefilename myusername@hostname:somedirectory/destinationfilename
        
          (from a remote system to local)
        $ scp myusername@hostname:somedirectory/sourcefilename destinationfilename
        
          (recursive directory copy to a remote system from local)
        $ scp -r sourcedirectory/ myusername@hostname:somedirectory/
        

        Linux / Solaris / AIX / HP-UX / Unix:

        • You should have already installed the "scp" command-line program.

        Microsoft Windows:

        • WinSCP is a full-featured and free graphical SCP and SFTP client.
        • PuTTY also offers "pscp.exe", which is an extremely small program and a basic SCP client.
        • Secure FX is a commercial SCP and SFTP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

        Mac OS X:

        • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".
      • 4.5.3  SFTP

        4.5.3  SFTP

        SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

        Command-line usage:

        $ sftp -B buffersize myusername@hostname
        
              (to a remote system from local)
        sftp> put sourcefile somedir/destinationfile
        sftp> put -P sourcefile somedir/
        
              (from a remote system to local)
        sftp> get sourcefile somedir/destinationfile
        sftp> get -P sourcefile somedir/
        
        sftp> exit
        
        • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
        • -P: optional, preserve file attributes and permissions

        Linux / Solaris / AIX / HP-UX / Unix:

        • The "sftp" command line program should already be installed.

        Microsoft Windows:

        • WinSCP is a full-featured and free graphical SFTP and SCP client.
        • PuTTY also offers "psftp.exe", which is an extremely small program and a basic SFTP client.
        • Secure FX is a commercial SFTP and SCP client which is freely available to Purdue students, faculty, and staff with a Purdue career account.

        Mac OS X:

        • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
        • MacSFTP is a free graphical SFTP client for Macs.
      • 4.5.4  Globus

        4.5.4  Globus

        Globus, previously known as Globus Online, is a powerful and easy to use file transfer service that is useful for transferring files virtually anywhere. It works within ITaP's various research storage systems; it connects between ITaP and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and Fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

        Globus Web:

        • Navigate to http://transfer.rcac.purdue.edu
        • Click "Proceed" to log in with your Purdue Career Account.
        • On your first login it will ask to make a connection to a Globus account. If you already have one - sign in to associate with your Career Account. Otherwise, click the link to create a new account.
        • Now you're at the main screen. Click "File Transfer" which will bring you to a two-endpoint interface.
        • The endpoint for disk-based storage is named "purdue#rcac", however, you can start typing "purdue" and it will autocomplete.
        • The paths to research storage are the same as they are when you're logged into the clusters, but are provided below for reference.
          • Home directory: /~/
          • Scratch directory: /scratch/carter/m/myusername where m is the first letter of your username and myusername is your career account name.
          • Research Data Depot directory: /depot/mygroupname where mygroupname is the name of your group.
          • Fortress can be accessed at the "purdue#fortress" endpoint.

        • For the second endpoint, you can choose any other Globus endpoint, such as another research site, or a Globus Personal endpoint, which will allow you to transfer to a personal workstation or laptop.

        Globus Personal Client setup:

        • On the endpoint page from earlier, click "Get Globus Connect Personal" or download it from here: Globus Connect Personal
        • Name this particular personal system and click "Generate Setup Key" on this page: Create Gloubs Personal endpoint
        • Copy the key and paste it into the setup box when installing the client for your system.
        • Your personal system is now available as an endpoint within the Globus transfer interface.

        Globus Command Line:

        For more information, please see Globus Support.

      • 4.5.5  Windows Network Drive / SMB

        4.5.5  Windows Network Drive / SMB

        SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and Fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

        Note: to access Carter through SMB file sharing, you must be on a Purdue campus network or connected through VPN.

        Windows:

        • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
        • Windows 8.1: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
        • In the folder location enter the following information and click Finish:

          • To access your home directory, enter \\samba.rcac.purdue.edu\myusername where myusername is your career account name.
          • To access your scratch space on Carter, enter \\samba.rcac.purdue.edu\scratch. Once mapped, you will be able to navigate to carter\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this drive mapping.
          • To access your Fortress long-term storage home directory, enter \\fortress-smb.rcac.purdue.edu\myusername where myusername is your career account name.
          • To access a shared Fortress group storage directory, enter \\fortress-smb.rcac.purdue.edu\group\mygroupname where mygroupname is the name of the shared group space.

        • You may be prompted for login information. Enter your username as onepurdue\myusername and your account password. If you forget the onepurdue prefix it will prevent you from logging in.
        • Your home, scratch, or Fortress directory should now be mounted as a drive in the Computer window.

        Mac OS X:

        • In the Finder, click Go > Connect to Server
        • In the Server Address enter the following information and click Connect:

          • To access your home directory, enter smb://samba.rcac.purdue.edu/myusername where myusername is your career account name.
          • To access your scratch space on Carter, enter smb://samba.rcac.purdue.edu\scratch. Once connected, you will be able to navigate to carter\m\myusername where m is the first letter of your username and myusername is your career account name. You may also navigate to any of the other cluster scratch directories from this mount.
          • To access your Fortress long-term storage home directory, enter smb://fortress-smb.rcac.purdue.edu/myusername where myusername is your career account name.
          • To access a shared Fortress group storage directory, enter smb://fortress-smb.rcac.purdue.edu/group/mygroupname where mygroupname is the name of the shared group space.

        • You may be prompted for login information. Enter your username, password and for the domain enter onepurdue or it will prevent you from logging in.

        Linux:

        • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
        • If you would like access via samba on the command line you may install smbclient which will give you ftp-like access and can be used as shown below. SCP or SFTP is recommended over this use case. For all the possible ways to connect look at the Mac OS X instructions.
          smbclient //samba.rcac.purdue.edu/myusername -U myusername -W onepurdue
  • 5  Applications on Carter

    5  Applications on Carter

    • 5.1  Provided Applications

      5.1  Provided Applications

      A catalog of available software on Carter is automatically generated from a list of software currently available via the module command. The catalog is organized by software categories such as compilers, libraries, and applications broken down by field of science. You may also compare software available across all ITaP Research Computing resources and search the catalog by keywords.

      Please contact rcac-help@purdue.edu if you are interested in the availability of software not shown in the catalog.

    • 5.2  Environment Management with the Module Command

      5.2  Environment Management with the Module Command

      ITaP uses the module command as the preferred method to manage your processing environment. With this command, you may load applications and compilers along with their libraries and paths. Modules are packages which you load and unload as needed.

      Please use the module command and do not manually configure your environment, as ITaP staff may make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.

      To view a brief usage report:

      $ module
      

      The following sections offer a short introduction to the module command. You can also refer to the module man page.

      • 5.2.1  List Available Modules

        5.2.1  List Available Modules

        To see what modules are available on this system:

        $ module avail
        

        To see which versions of a specific compiler are available on this system:

        $ module avail gcc
        $ module avail intel
        $ module avail pgi
        

        To see available modules for MPI libraries:

         $ module avail openmpi 
         $ module avail mvapich2    
         $ module avail impi    
        

        To see available modules for specific provided applications, use names from the list obtained with the command module avail:

        $ module avail abaqus
        $ module avail matlab
        $ module avail mathematica
        
      • 5.2.2  Load / Unload a Module

        5.2.2  Load / Unload a Module

        All modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or you may specify which version you wish to load.

        For each cluster, ITaP makes a recommendation regarding the set of compiler, math library, and message-passing library for parallel code. To load the recommended set:

        $ module load devel
        

        To verify what you loaded:

        $ module list
        

        To load the default version of a specific compiler, choose one of the following commands:

        $ module load gcc
        $ module load intel
        $ module load pgi
        

        To load a specific version of the recommended compiler, include the version number:

        $ module load intel/13.1.1.163

        When running a job, you must use the job submission file to load on the compute node(s) any relevant modules. Loading modules on the front end before submitting your job is sufficient when using the front end during the development phase of your application but not sufficient when using the compute node(s) during the production phase. You must load the same modules on the compute node(s).

        To unload a module, enter the same module name used to load that module. Unloading will attempt to undo the environmental changes which a previous load command installed.

        To unload the default version of a specific compiler:

        $ module unload gcc
        $ module unload intel
        $ module unload pgi
        

        To unload a specific version of the recommended compiler, include the same version number used to load that Intel compiler:

        $ module unload intel/13.1.1.163

        Apply the same methods to manage the modules of provided applications:

        $ module load matlab
        $ module unload matlab
        

        To unload all currently loaded modules:

        module purge
        
      • 5.2.3  List Currently Loaded Modules

        5.2.3  List Currently Loaded Modules

        To see currently loaded modules:

        $ module list
        Currently Loaded Modulefiles:
          1) intel/12.1
        

        To unload a module:

        $ module unload intel
        $ module list
        No Modulefiles Currently Loaded.
        
      • 5.2.4  Show Module Details

        5.2.4  Show Module Details

        To learn more about what a module does to your environment, you may use the module show module_name command, where module_name is any name in the list from command module avail. This can be either a default name like "intel", "gcc", "pgi", and "matlab", or a specific version of a module, such as "intel/11.1.072". Here is an example showing what loading the default Intel compiler does to the processing environment:

        $ module show intel
        -------------------------------------------------------------------
        /opt/modules/modulefiles/intel/12.1:
        
        module-whatis    invoke Intel 12.1.0 Compilers (64-bit)
        prepend-path     PATH /opt/intel/composer_xe_2011_sp1.6.233/bin/intel64
        prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21
        prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64
        prepend-path     LD_LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
        prepend-path     LIBRARY_PATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/lib/intel64/cc4.1.0_libc2.4_kernel2.6.16.21
        prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/locale/%l_%t/%N
        prepend-path     NLSPATH /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64/locale/%l_%t/%N
        prepend-path     CPATH /opt/intel/composer_xe_2011_sp1.6.233/tbb/include
        setenv           CC icc
        setenv           CXX icpc
        setenv           FC ifort
        setenv           ICC_HOME /opt/intel/composer_xe_2011_sp1.6.233
        setenv           IFORT_HOME /opt/intel/composer_xe_2011_sp1.6.233
        setenv           MKL_HOME /opt/intel/composer_xe_2011_sp1.8.273/mkl
        setenv           TBBROOT /opt/intel/composer_xe_2011_sp1.6.233/tbb
        setenv           LAPACK_INCLUDE -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include
        setenv           LAPACK_INCLUDE_F95 -I/opt/intel/composer_xe_2011_sp1.8.273/mkl/include/intel64/lp64
        setenv           LINK_LAPACK -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
        setenv           LINK_LAPACK_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
        setenv           LINK_LAPACK95 -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
        setenv           LINK_LAPACK95_STATIC -L/opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64 -L/opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_lp64.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/composer_xe_2011_sp1.8.273/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
        -------------------------------------------------------------------
        

        To show what loading a specific Intel compiler version does to the processing environment:

        $ module show intel/11.1.072
        -------------------------------------------------------------------
        /opt/modules/modulefiles/intel/11.1.072:
        
        module-whatis    invoke Intel 11.1.072 64-bit Compilers
        prepend-path     PATH /opt/intel/Compiler/11.1/072/bin/intel64
        prepend-path     LD_LIBRARY_PATH /opt/intel/mkl/10.2.5.035/lib/em64t
        prepend-path     LD_LIBRARY_PATH /opt/intel/Compiler/11.1/072/lib/intel64
        prepend-path     NLSPATH /opt/intel/mkl/10.2.5.035/lib/em64t/locale/%l_%t/%N
        prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/idb/intel64/locale/%l_%t/%N
        prepend-path     NLSPATH /opt/intel/Compiler/11.1/072/lib/intel64/locale/%l_%t/%N
        setenv           CC icc
        setenv           CXX icpc
        setenv           FC ifort
        setenv           F90 ifort
        setenv           ICC_HOME /opt/intel/Compiler/11.1/072
        setenv           IFORT_HOME /opt/intel/Compiler/11.1/072
        setenv           MKL_HOME /opt/intel/mkl/10.2.5.035
        setenv           LAPACK_INCLUDE -I/opt/intel/mkl/10.2.5.035/include
        setenv           LAPACK_INCLUDE_F95 -I/opt/intel/mkl/10.2.5.035/include/em64t/lp64
        setenv           LINK_LAPACK -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t
        setenv           LINK_LAPACK_STATIC -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread
        setenv           LINK_LAPACK95 -L/opt/intel/mkl/10.2.5.035/lib/em64t -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -Xlinker -rpath -Xlinker /opt/intel/mkl/10.2.5.035/lib/em64t
        setenv           LINK_LAPACK95_STATIC -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -Bstatic -Wl,--start-group /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_lp64.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_intel_thread.a /opt/intel/mkl/10.2.5.035/lib/em64t/libmkl_core.a -Wl,--end-group -Bdynamic -liomp5 -lpthread
        -------------------------------------------------------------------
        
  • 6  Compiling Source Code on Carter

    6  Compiling Source Code on Carter

    • 6.1  Provided Compilers

      6.1  Provided Compilers

      Compilers are available on Carter for Fortran, C, and C++. Compiler sets from Intel, GNU, and PGI are installed. A full list of compiler versions installed on Carter is available in the software catalog. More detailed documentation on each compiler set available on Carter follows.

      On Carter, ITaP recommends the following set of compiler, math library, and message-passing library for parallel code:

      • Intel 13.1.1.163
      • MKL
      • OpenMPI 1.6.3

      To load the recommended set:

      $ module load devel
      $ module list
      
      • 6.1.1  Intel Compiler Set

        6.1.1  Intel Compiler Set

        One or more versions of the Intel compiler set (compilers and associated libraries) are available on Carter. To discover which ones:

        $ module avail intel
        

        Choose an appropriate Intel module and load it. For example:

        $ module load intel
        

        Here are some examples for the Intel compilers:

        Language Serial Program MPI Program OpenMP Program
        Fortran77
        $ ifort myprogram.f -o myprogram
        
        $ mpiifort myprogram.f -o myprogram
        
        $ ifort -openmp myprogram.f -o myprogram
        
        Fortran90
        $ ifort myprogram.f90 -o myprogram
        
        $ mpiifort myprogram.f90 -o myprogram
        
        $ ifort -openmp myprogram.f90 -o myprogram
        
        Fortran95 (same as Fortran 90) (same as Fortran 90) (same as Fortran 90)
        C
        $ icc myprogram.c -o myprogram
        
        $ mpiicc myprogram.c -o myprogram
        
        $ icc -openmp myprogram.c -o myprogram
        
        C++
        $ icpc myprogram.cpp -o myprogram
        
        $ mpiicpc myprogram.cpp -o myprogram
        
        $ icpc -openmp myprogram.cpp -o myprogram
        

        More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

        For more documentation on the Intel compilers:

      • 6.1.2  GNU Compiler Set

        6.1.2  GNU Compiler Set

        The official name of the GNU compilers is "GNU Compiler Collection" or "GCC". One or more versions of the GNU compiler set (compilers and associated libraries) are available on Carter. To discover which ones:

        $ module avail gcc
        

        Choose an appropriate GCC module and load it. For example:

        $ module load gcc
        

        An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load a newer version using the command module load gcc.

        Here are some examples for the GNU compilers:

        Language Serial Program MPI Program OpenMP Program
        Fortran77
        $ gfortran myprogram.f -o myprogram
        
        $ mpif77 myprogram.f -o myprogram
        
        $ gfortran -fopenmp myprogram.f -o myprogram
        
        Fortran90
        $ gfortran myprogram.f90 -o myprogram
        
        $ mpif90 myprogram.f90 -o myprogram
        
        $ gfortran -fopenmp myprogram.f90 -o myprogram
        
        Fortran95
        $ gfortran myprogram.f95 -o myprogram
        
        $ mpif90 myprogram.f95 -o myprogram
        
        $ gfortran -fopenmp myprogram.f95 -o myprogram
        
        C
        $ gcc myprogram.c -o myprogram
        
        $ mpicc myprogram.c -o myprogram
        
        $ gcc -fopenmp myprogram.c -o myprogram
        
        C++
        $ g++ myprogram.cpp -o myprogram
        
        $ mpiCC myprogram.cpp -o myprogram
        
        $ g++ -fopenmp myprogram.cpp -o myprogram
        

        More information on compiler options appear in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

        For more documentation on the GCC compilers:

      • 6.1.3  PGI Compiler Set

        6.1.3  PGI Compiler Set

        One or more versions of the PGI compiler set (compilers and associated libraries) are available on Carter. To discover which ones:

        $ module avail pgi
        

        Choose an appropriate PGI module and load it. For example:

        $ module load pgi
        

        Here are some examples for the PGI compilers:

        Language Serial Program MPI Program OpenMP Program
        Fortran77
        $ pgf77 myprogram.f -o myprogram
        
        $ mpif77 myprogram.f -o myprogram
        
        $ pgf77 -mp myprogram.f -o myprogram
        
        Fortran90
        $ pgf90 myprogram.f90 -o myprogram
        
        $ mpif90 myprogram.f90 -o myprogram
        
        $ pgf90 -mp myprogram.f90 -o myprogram
        
        Fortran95
        $ pgf95 myprogram.f95 -o myprogram
        
        $ mpif90 myprogram.f95 -o myprogram
        
        $ pgf95 -mp myprogram.f95 -o myprogram
        
        C
        $ pgcc myprogram.c -o myprogram
        
        $ mpicc myprogram.c -o myprogram
        
        $ pgcc -mp myprogram.c -o myprogram
        
        C++
        $ pgCC myprogram.cpp -o myprogram
        
        $ mpiCC myprogram.cpp -o myprogram
        
        $ pgCC -mp myprogram.cpp -o myprogram
        

        More information on compiler options can be found in the official man pages, which are accessible with the man command after loading the appropriate compiler module, or online here:

        For more documentation on the PGI compilers:

    • 6.2  Compiling Serial Programs

      6.2  Compiling Serial Programs

      A serial program is a single process which executes as a sequential stream of instructions on one computer. Compilers capable of serial programming are available for C, C++, and versions of Fortran.

      Here are a few sample serial programs:

      To load a compiler, enter one of the following:

      $ module load intel
      $ module load gcc
      $ module load pgi
      

      The following table illustrates how to compile your serial program:

      Language Intel Compiler GNU Compiler PGI Compiler
      Fortran 77
      $ ifort myprogram.f -o myprogram
      
      $ gfortran myprogram.f -o myprogram
      
      $ pgf77 myprogram.f -o myprogram
      
      Fortran 90
      $ ifort myprogram.f90 -o myprogram
      
      $ gfortran myprogram.f90 -o myprogram
      
      $ pgf90 myprogram.f90 -o myprogram
      
      Fortran 95
      $ ifort myprogram.f90 -o myprogram
      
      $ gfortran myprogram.f95 -o myprogram
      
      $ pgf95 myprogram.f95 -o myprogram
      
      C
      $ icc myprogram.c -o myprogram
      
      $ gcc myprogram.c -o myprogram
      
      $ pgcc myprogram.c -o myprogram
      
      C++
      $ icc myprogram.cpp -o myprogram
      
      $ g++ myprogram.cpp -o myprogram
      
      $ pgCC myprogram.cpp -o myprogram
      

      The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

    • 6.3  Compiling MPI Programs

      6.3  Compiling MPI Programs

      OpenMPI, MVAPICH2, and Intel MPI (IMPI) are implementations of the Message-Passing Interface (MPI) standard. Libraries for these MPI implementations and compilers for C, C++, and Fortran are available on Carter. A full list of MPI library versions installed on Carter is available in the software catalog.

      MPI programs require including a header file:

      Language Header Files
      Fortran 77
      INCLUDE 'mpif.h'
      
      Fortran 90
      INCLUDE 'mpif.h'
      
      Fortran 95
      INCLUDE 'mpif.h'
      
      C
      #include <mpi.h>
      
      C++
      #include <mpi.h>
      

      Here are a few sample programs using MPI:

      To see the available MPI libraries:

       $ module avail openmpi 
       $ module avail mvapich2    
       $ module avail impi    
      

      The following table illustrates how to compile your MPI program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.

      Language Intel MPI OpenMPI , MVAPICH2 , or Intel MPI (IMPI)
      Fortran 77
      $ mpiifort program.f -o program
      
      $ mpif77 program.f -o program
      
      Fortran 90
      $ mpiifort program.f90 -o program
      
      $ mpif90 program.f90 -o program
      
      Fortran 95
      $ mpiifort program.f95 -o program
      
      $ mpif90 program.f95 -o program
      
      C
      $ mpiicc program.c -o program
      
      $ mpicc program.c -o program
      
      C++
      $ mpiicpc program.C -o program
      
      $ mpiCC program.C -o program
      

      The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

      Here is some more documentation from other sources on the MPI libraries:

    • 6.4  Compiling OpenMP Programs

      6.4  Compiling OpenMP Programs

      All compilers installed on Carter include OpenMP functionality for C, C++, and Fortran. An OpenMP program is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over processor cores in a single compute node without the need for MPI communications.

      OpenMP programs require including a header file:

      Language Header Files
      Fortran 77
      INCLUDE 'omp_lib.h'
      
      Fortran 90
      use omp_lib
      
      Fortran 95
      use omp_lib
      
      C
      #include <omp.h>
      
      C++
      #include <omp.h>
      

      Sample programs illustrate task parallelism of OpenMP:

      A sample program illustrates loop-level (data) parallelism of OpenMP:

      To load a compiler, enter one of the following:

      $ module load intel
      $ module load gcc
      $ module load pgi
      

      The following table illustrates how to compile your shared-memory program. Any compiler flags accepted by ifort/icc compilers are compatible with OpenMP.

      Language Intel Compiler GNU Compiler PGI Compiler
      Fortran 77
      $ ifort -openmp myprogram.f -o myprogram
      
      $ gfortran -fopenmp myprogram.f -o myprogram
      
      $ pgf77 -mp myprogram.f -o myprogram
      
      Fortran 90
      $ ifort -openmp myprogram.f90 -o myprogram
      
      $ gfortran -fopenmp myprogram.f90 -o myprogram
      
      $ pgf90 -mp myprogram.f90 -o myprogram
      
      Fortran 95
      $ ifort -openmp myprogram.f90 -o myprogram
      
      $ gfortran -fopenmp myprogram.f95 -o myprogram
      
      $ pgf95 -mp myprogram.f95 -o myprogram
      
      C
      $ icc -openmp myprogram.c -o myprogram
      
      $ gcc -fopenmp myprogram.c -o myprogram
      
      $ pgcc -mp myprogram.c -o myprogram
      
      C++
      $ icc -openmp myprogram.cpp -o myprogram
      
      $ g++ -fopenmp myprogram.cpp -o myprogram
      
      $ pgCC -mp myprogram.cpp -o myprogram
      

      The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

      Here is some more documentation from other sources on OpenMP:

    • 6.5  Compiling Hybrid Programs

      6.5  Compiling Hybrid Programs

      A hybrid program combines both MPI and shared-memory to take advantage of compute clusters with multi-core compute nodes. Libraries for OpenMPI, MVAPICH2, and Intel MPI (IMPI) and compilers which include OpenMP for C, C++, and Fortran are available.

      Hybrid programs require including header files:

      Language Header Files
      Fortran 77
      INCLUDE 'omp_lib.h'
      INCLUDE 'mpif.h'
      
      Fortran 90
      use omp_lib
      INCLUDE 'mpif.h'
      
      Fortran 95
      use omp_lib
      INCLUDE 'mpif.h'
      
      C
      #include <mpi.h>
      #include <omp.h>
      
      C++
      #include <mpi.h>
      #include <omp.h>
      

      A few examples illustrate hybrid programs with task parallelism of OpenMP:

      This example illustrates a hybrid program with loop-level (data) parallelism of OpenMP:

      To see the available MPI libraries:

       $ module avail openmpi 
       $ module avail mvapich2    
       $ module avail impi    
      

      The following table illustrates how to compile your hybrid (MPI/OpenMP) program. Any compiler flags accepted by Intel ifort/icc compilers are compatible with their respective MPI compiler.

      Language Intel MPI OpenMPI , MVAPICH2 , or Intel MPI (IMPI) with Intel Compiler
      Fortran 77
      $ mpiifort -openmp myprogram.f -o myprogram
      
      $ mpif77 -openmp myprogram.f -o myprogram
      
      Fortran 90
      $ mpiifort -openmp myprogram.f90 -o myprogram
      
      $ mpif90 -openmp myprogram.f90 -o myprogram
      
      Fortran 95
      $ mpiifort -openmp myprogram.f90 -o myprogram
      
      $ mpif90 -openmp myprogram.f90 -o myprogram
      
      C
      $ mpiicc -openmp myprogram.c -o myprogram
      
      $ mpicc -openmp myprogram.c -o myprogram
      
      C++
      $ mpiicpc -openmp myprogram.C -o myprogram
      
      $ mpiCC -openmp myprogram.C -o myprogram
      
      Language OpenMPI , MVAPICH2 , or Intel MPI (IMPI) with GNU Compiler OpenMPI , MVAPICH2 , or Intel MPI (IMPI) with PGI Compiler
      Fortran 77
      $ mpif77 -fopenmp myprogram.f -o myprogram
      
      $ mpif77 -mp myprogram.f -o myprogram
      
      Fortran 90
      $ mpif90 -fopenmp myprogram.f90 -o myprogram
      
      $ mpif90 -mp myprogram.f90 -o myprogram
      
      Fortran 95
      $ mpif90 -fopenmp myprogram.f95 -o myprogram
      
      $ mpif90 -mp myprogram.f95 -o myprogram
      
      C
      $ mpicc -fopenmp myprogram.c -o myprogram
      
      $ mpicc -mp myprogram.c -o myprogram
      
      C++
      $ mpiCC -fopenmp myprogram.C -o myprogram
      
      $ mpiCC -mp myprogram.C -o myprogram
      

      The Intel, GNU and PGI compilers will not output anything for a successful compilation. Also, the Intel compiler does not recognize the suffix ".f95".

    • 6.6  Compiling GPGPU/CUDA Programs

      6.6  Compiling GPGPU/CUDA Programs

      A graphics processing unit (GPU) is a specialized, single-chip, parallel processor designed to accelerate a wide range of compute-intensive applications. GPGPU computing uses a GPU as a coprocessor to the CPU. Acceleration comes from offloading some of the compute-intensive and time-consuming portions of the code to the GPU; the rest of the application still runs on the CPU. This model is similar to the master/worker theads of shared-memory programming of OpenMP. The highly parallel structure of the wide vector width of the SIMD (Single Instruction, Multiple Data) architecture makes the GPU ideally suited for handling data parallelism and can yield a performance that is several orders of magnitude greater than what a conventional CPU can offer.

      The Carter cluster has twelve compute nodes each with three NVIDIA Tesla M2090 GPUs. Each Tesla GPU contains 512 stream processors and 6GB of memory.

      There are two main methods for programming an NVIDIA GPU. The first and most widely used mode is CUDA, which NVIDIA developed specifically for GPU compute on its cards. The other method is the OpenCL standard, which is newer and more widely supported across devices, but is not as widely used at this point. This section focuses on using CUDA since it's currently more relevant to NVIDIA GPUs and has a slightly easier learning curve.

      A simple CUDA program has a basic workflow:

      • 1) Initialize an array on the host (CPU).
      • 2) Copy array from host memory to GPU memory.
      • 3) Apply an operation to array on GPU.
      • 4) Copy array from GPU memory to host memory.

      Here is a sample CUDA program:

      Both front-ends and GPU-enabled compute nodes have the CUDA tools and libraries available to compile CUDA programs. To compile a CUDA program, load CUDA, and use nvcc to compile the program:

      $ module load cuda
      $ cd myworkingdirectory
      $ nvcc gpu_hello.cu -o gpu_hello
      

      The example illustrates only how to copy an array between a CPU and its GPU but does not perform a serious computation.

      Applying the basic workflow to even a highly parallel algorithm is relatively easy but can achieve only a low level of parallelism and some improvement in performance. Performance on a GPU is sensitive; a small change to the code can dramatically change performance. To achieve the full potential of a GPU, you must fashion an algorithm to match the architecture of a GPU to reap the benefits. This requires asynchronous function calls, streams, and understanding the several ways GPU technology may vary:

      • number of multiprocessors
      • memory bandwidth
      • shared memory size
      • register file size
      • threads per block
      • coalesced thread operations

      Parameterization of code addressing these GPU variables aids adapting code to different GPUs as the technology evolves.

      The following program exhibits the power of a GPU by timing three square matrix multiplications on a CPU and on the global and shared memory of a GPU:

      The speedups are dramatic:

                                                                  speedup
                                                                  -------
      Elapsed time in CPU:                    8435.3 milliseconds
      Elapsed time in GPU (global memory):      46.9 milliseconds  180.0
      Elapsed time in GPU (shared memory):      29.9 milliseconds  282.3
      

      For best performance, the input array or matrix must be sufficiently large to overcome the overhead in copying the input and output data to and from the GPU.

      For more information about NVIDIA, CUDA, and GPUs:

    • 6.7  Provided Libraries

      6.7  Provided Libraries

      Some mathematical libraries are available on Carter. More detailed documentation about the libraries available on Carter follows.

      • 6.7.1  Intel Math Kernel Library (MKL)

        6.7.1  Intel Math Kernel Library (MKL)

        Intel Math Kernel Library (MKL) contains ScaLAPACK, LAPACK, Sparse Solver, BLAS, Sparse BLAS, CBLAS, GMP, FFTs, DFTs, VSL, VML, and Interval Arithmetic routines. MKL resides in the directory stored in the environment variable MKL_HOME, after loading a version of the Intel compiler with module.

        By using module load to activate an Intel compiler your shell environment will have several variables set up to help link applications with MKL. Here are some example combinations of simplified linking options:

        $ module load intel
        $ echo $LINK_LAPACK
        -L${MKL_HOME}/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
        
        $ echo $LINK_LAPACK95
        -L${MKL_HOME}/lib/intel64 -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
        

        ITaP recommends you use the provided variables to define MKL linking options in your compiling procedures. The Intel compiler modules also provide two other environment variables, LINK_LAPACK_STATIC and LINK_LAPACK95_STATIC that you may use if you need to link MKL statically.

        ITaP recommends that you use dynamic linking of libguide. If so, define LD_LIBRARY_PATH such that you are using the correct version of libguide at run time. If you use static linking of libguide (discouraged), then:

        • If you use the Intel compilers, link in the libguide version that comes with the compiler (use the -openmp option).
        • If you do not use the Intel compilers, link in the libguide version that comes with the Intel MKL above.

        Here are some more documentation from other sources on the Intel MKL:

    • 6.8  Mixing Fortran, C, and C++ Code on Unix

      6.8  Mixing Fortran, C, and C++ Code on Unix

      You may write different parts of a computing application in different programming languages. For example, an application might incorporate older, legacy code which performs numerical calculations written in Fortran. Systems functions might use C. A newer, main program which binds together all older code might use C++ to take advantage of the object orientation. This section illustrates a few simple examples.

      For more information about mixing programming languages:

    • 6.9  Using cpp with Fortran

      6.9  Using cpp with Fortran

      If the source file ends with .F, .fpp, or .FPP, cpp automatically preprocesses the source code before compilation. If you want to use the C preprocessor with source files that do not end with .F, use the following compiler option to specify the filename suffix:

      • GNU Compilers: -x f77-cpp-input
        Note that preprocessing does not extend to the contents of files included by an "INCLUDE" directive. You must use the #include preprocessor directive instead.
        For example, to preprocess source files that end with .f:
        $ gfortran -x f77-cpp-input myprogram.f
        
      • Intel Compilers: -cpp
        To tell the compiler to link using C++ runtime libraries included with gcc/icc:
        $ ... -cxxlib -gcc/-cxxlib -icc
        
        For example, to preprocess source files that end with .f:
        $ ifort -cpp myprogram.f
        

      Generally, it is advisable to rename your file from myprogram.f to myprogram.F. The preprocessor then automatically runs when you compile the file.

      For more information on combining C/C++ and Fortran:

      • 6.9.1  Using cpp with Fortran

        6.9.1  Using cpp with Fortran

        If the source file ends with .F, .fpp, or .FPP, cpp automatically preprocesses the source code before compilation. If you want to use the C preprocessor with source files that do not end with .F, use the following compiler option to specify the filename suffix:

        • GNU Compilers: -x f77-cpp-input
          Note that preprocessing does not extend to the contents of files included by an "INCLUDE" directive. You must use the #include preprocessor directive instead.
          For example, to preprocess source files that end with .f:
          $ gfortran -x f77-cpp-input myprogram.f
          
        • Intel Compilers: -cpp
          To tell the compiler to link using C++ runtime libraries included with gcc/icc:
          $ ... -cxxlib -gcc/-cxxlib -icc
          
          For example, to preprocess source files that end with .f:
          $ ifort -cpp myprogram.f
          

        Generally, it is advisable to rename your file from myprogram.f to myprogram.F. The preprocessor then automatically runs when you compile the file.

        For more information on combining C/C++ and Fortran:

      • 6.9.2  C Program Calling Subroutines in Fortran, C, and C++

        6.9.2  C Program Calling Subroutines in Fortran, C, and C++

        A C language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

        To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C program calls the Fortran routine with the underscore character.

        Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C++ routine to a C program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ while the C program again specifies a pointer (ampersand "&") in the call to the C++ routine.

        The C++ compiler must know at the time of compiling the C++ routine that the C program will invoke the C++ routine with the C-style interface rather than the C++ interface.

        The following files of source code illustrate these technical details:

        Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

        Compiler Intel GNU PGI
        C Main Program
        $ module load intel
        $ icc -c main.c
        $ ifort -c f90.f90
        $ icc -c c.c
        $ icc -c cpp.cpp
        $ icc -lstdc++ main.o f90.o c.o cpp.o
        
        $ module load gcc
        $ gcc -c main.c
        $ gfortran -c f90.f90
        $ gcc -c c.c
        $ g++ -c cpp.cpp
        $ gcc -lstdc++ main.o f90.o c.o cpp.o
        
        $ module load pgi
        $ pgcc -c main.c
        $ pgcc -c c.c
        $ pgCC -c cpp.cpp
        $ pgf90 -Mnomain main.o c.o cpp.o f90.f90
        
        

        The results show that each routine successfully returns a different character to the main program:

        $ a.out
        main(), initial value:               chr=X
        main(), after function subr_f_():    chr=f
        main(), after function func_c():     chr=c
        main(), after function func_cpp():   chr=+
        Exit main.c
        
      • 6.9.3  C++ Program Calling Subroutines in Fortran, C, and C++

        6.9.3  C++ Program Calling Subroutines in Fortran, C, and C++

        A C++ language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

        To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine. The C++ program calls the Fortran routine with the underscore character.

        Fortran uses pass-by-reference while C++ uses pass-by-value. Therefore, to pass a value from a Fortran routine to a C++ program requires the argument in the call to the Fortran routine to be a pointer (ampersand "&"). To pass a value from a C routine to a C++ program, the C routine must declare a parameter as a pointer (asterisk "*") while the C++ program again specifies a pointer (ampersand "&") in the call to the C routine.

        The C++ compiler must know at the time of compiling the C++ program that the C++ program will invoke the Fortran and C routines with the C-style interface rather than the C++ interface.

        The following files of source code illustrate these technical details:

        Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

        Compiler Intel GNU PGI
        C++ Main Program
        $ module load intel
        $ icc -c main.cpp
        $ ifort -c f90.f90
        $ icc -c c.c
        $ icc -c cpp.cpp
        $ icc -lstdc++ main.o f90.o c.o cpp.o
        
        $ module load gcc
        $ g++ -c main.cpp
        $ gfortran -c f90.f90
        $ gcc -c c.c
        $ g++ -c cpp.cpp
        $ g++ main.o f90.o c.o cpp.o
        
        $ module load pgi
        $ pgCC -c main.cpp
        $ pgf90 -c f90.f90
        $ pgcc -c c.c
        $ pgCC -c cpp.cpp
        $ pgCC -L../lib main.o c.o cpp.o f90.o -pgf90libs
        

        The results show that each routine successfully returns a different character to the main program:

        $ a.out
        main(), initial value:               chr=X
        main(), after function subr_f_():    chr=f
        main(), after function func_c():     chr=c
        main(), after function func_cpp():   chr=+
        Exit main.cpp
        
      • 6.9.4  Fortran Program Calling Subroutines in Fortran, C, and C++

        6.9.4  Fortran Program Calling Subroutines in Fortran, C, and C++

        A Fortran language program calls routines written in Fortran 90, C, and C++. The routines change the value of a character argument. To understand what makes this example work, you must be aware of a few simple issues.

        To discover how the chosen Fortran compiler handles the names of routines, apply the Linux command nm to the object file: nm filename.o. The Fortran compilers used in this example append an underscore after the name of a routine, so the definitions of the C and C++ routines must include the underscore. The Fortran program calls these routines without the underscore character in the Fortran source code.

        Fortran uses pass-by-reference while C uses pass-by-value. Therefore, to pass a value from a C routine to a Fortran program requires the parameter of the C routine to be a pointer (asterisk "*") in the C routine's definition. To pass a value from a C++ routine to a Fortran program, the C++ routine may use the pass-by-reference syntax (ampersand "&") of C++ in its definition.

        The C++ compiler must know at the time of compiling the C++ routine that the Fortran program will invoke the C++ routine with the C-style interface rather than the C++ interface.

        The following files of source code illustrate these technical details:

        Separately compile each source code file with the appropriate compiler into an object (.o) file. Then link the object files into a single executable file (a.out):

        Compiler Intel GNU PGI
        Fortran 90 Main Program
        $ module load intel
        $ ifort -c main.f90
        $ ifort -c f90.f90
        $ icc -c c.c
        $ icc -c cpp.cpp
        $ ifort -lstdc++ main.o f90.o c.o cpp.o
        
        $ module load gcc
        $ gfortran -c main.f90
        $ gfortran -c f90.f90
        $ gcc -c c.c
        $ g++ -c cpp.cpp
        $ gfortran -lstdc++ main.o c.o cpp.o f90.o
        
        $ module load pgi
        $ pgf90 -c main.f90
        $ pgf90 -c f90.f90
        $ pgcc -c c.c
        $ pgCC -c cpp.cpp
        $ pgf90 main.o c.o cpp.o f90.o
        

        The results show that each routine successfully returns a different character to the main program:

        $ a.out
         main(), initial value:               chr=X
         main(), after function subr_f():     chr=f
         main(), after function subr_c():     chr=c
         main(), after function func_cpp():   chr=+
         Exit mixlang
        
  • 7  Running Jobs on Carter

    7  Running Jobs on Carter

    There are two methods for submitting jobs to the Carter community cluster. First, you may use the portable batch system (PBS) to submit jobs directly to a queue on Carter. PBS performs job scheduling. Jobs may be serial, message-passing, shared-memory, or hybrid (message-passing + shared-memory) programs. You may use either the batch or interactive mode to run your jobs. Use the batch mode for finished programs; use the interactive mode only for debugging. Secondly, since the Carter cluster is a part of BoilerGrid, you may submit serial jobs to BoilerGrid and specifically request compute nodes on Carter.

    • 7.1  Running Jobs via PBS

      7.1  Running Jobs via PBS

      The Portable Batch System (PBS) is a richly featured workload management system providing job scheduling and job management interface on computing resources, including Linux clusters. With PBS, a user requests resources and submits a job to a queue. The system will then take jobs from queues, allocate the necessary nodes, and execute them in as efficient a manner as it can.

      Do NOT run large, long, multi-threaded, parallel, or CPU-intensive jobs on a front-end login host. All users share the front-end hosts, and running anything but the smallest test job will negatively impact everyone's ability to use Carter. Always use PBS to submit your work as a job. You may even submit interactive sessions as jobs. This section of documentation will explain how to use PBS.

      • 7.1.1  Tips

        7.1.1  Tips

        • Remember that ppn can not be larger than the number of processor cores on each node.
        • If you compiled your own code, you must module load that same compiler from your job submission file. However, it is not necessary to load the standard compiler module if you load the corresponding compiler module with parallel libraries included.
        • To see a list of the nodes which ran your job: cat $PBS_NODEFILE
        • The order of processor cores is random. There is no way to tell which processor will do what or in which order in a parallel program.
        • If you use the tcsh and csh shells and if a .logout file exists in your home directory, the exit status of your jobs will be that of the .logout script, not the job submission file. This may impact any interjob dependencies. To preserve the job exit status, remove the .logout file.
      • 7.1.2  Queues

        7.1.2  Queues

        Carter, as a community cluster, has one or more queues dedicated to each partner who has purchased access to the cluster. These queues provide partners with priority access to their portion of the cluster. Additionally, community clusters provide a "standby" queue which is available to all cluster users. This "standby" queue allows users to utilize portions of the cluster that would otherwise be idle, but at a lower priority than partner-queue jobs, and with a relatively short time limit, to ensure "standby" jobs will not be able to tie up resources and prevent partner-queue jobs from running quickly.

        To see a list of all queues on Carter that you may submit to, use the qlist command:

        $ qlist
        
                                  Current Number of Cores
        Queue                 Total     Queue   Run     Free         Max Walltime
        ===============    ====================================     ==============
        myqueue                  16        32       8        8          720:00:00
        standby               9,584     7,384   4,678       98            4:00:00
        

        This lists each queue you can submit to, the number of cores allocated to the queue, the total number of cores queued in jobs waiting to run, how many cores are in use, and how many are available to run jobs. The maximum walltime you may request is also listed. This command can be used to get a general idea of how busy a queue is and how long you may have to wait for your job to start.

      • 7.1.3  Job Submission File

        7.1.3  Job Submission File

        To submit work to a PBS queue, you must first create a job submission file. This job submission file is essentially a simple shell script. It will set any required environment variables, load any necessary modules, create or modify files and directories in your scratch space, and invoke any applications that you need. However, a job submission file can be as simple as the path to your application:

        #!/bin/sh -l
        # FILENAME:  myjobsubmissionfile
        
        # Print the hostname of the compute node on which this job is running.
        /bin/hostname
        

        Or, as simple as listing the names of compute nodes assigned to your job:

        #!/bin/sh -l
        # FILENAME:  myjobsubmissionfile
        
        # PBS_NODEFILE contains the names of assigned compute nodes.
        cat $PBS_NODEFILE
        

        PBS sets several potentially useful environment variables which you may use within your job submission files. Here is a list of some:

        Name Description
        PBS_O_WORKDIR Absolute path of the current working directory when you submitted this job
        PBS_JOBID Job ID number assigned to this job by the batch system
        PBS_JOBNAME Job name supplied by the user
        PBS_NODEFILE File containing the list of nodes assigned to this job
        PBS_O_HOST Hostname of the system where you submitted this job
        PBS_O_QUEUE Name of the original queue to which you submitted this job
        PBS_O_SYSTEM Operating system name given by uname -s where you submitted this job
        PBS_ENVIRONMENT "PBS_BATCH" if this job is a batch job, or "PBS_INTERACTIVE" if this job is an interactive job

        Here is an example of a commonly used PBS variable, making sure a job runs from within the same directory that you submitted it from:

        #!/bin/sh -l
        # FILENAME:  myjobsubmissionfile
        
        # Change to the directory from which you originally submitted this job.
        cd $PBS_O_WORKDIR
        
        # Print out the current working directory path.
        pwd
        

        You may also find the need to load a module to run a job on a compute node. Loading a module on a front end does NOT automatically load that module on the compute node where a job runs. You must use the job submission file to load a module on the compute node:

        #!/bin/sh -l
        # FILENAME:  myjobsubmissionfile
        
        # Load the module for NetPBM.
        module load netpbm
        
        # Convert a PostScript file to GIF format using NetPBM tools.
        pstopnm myfilename.ps | ppmtogif > myfilename.gif
        
      • 7.1.4  Job Submission

        7.1.4  Job Submission

        Once you have a job submission file, you may submit this script to PBS using the qsub command. PBS will find an available processor core or a set of processor cores and run your job there, or leave your job in a queue until some become available. At submission time, you may also optionally specify many other attributes or job requirements you have regarding where your jobs will run.

        To submit your serial job to one processor core on one compute node with no special requirements:

        $ qsub myjobsubmissionfile
        

        To submit your job to a specific queue:

        $ qsub -q myqueuename myjobsubmissionfile
        

        By default, each job receives 30 minutes of wall time for its execution. The wall time is the total time in real clock time (not CPU cycles) that you believe your job will need to run to completion. If you know that your job will not need more than a certain amount of time to run, it is very much to your advantage to request less than the maximum allowable wall time, as this may allow your job to schedule and run sooner. To request the specific wall time of 1 hour and 30 minutes:

        $ qsub -l walltime=01:30:00 myjobsubmissionfile
        

        To submit your job with your currently-set environment variables:

        $ qsub -V myjobsubmissionfile
        

        The nodes resource indicates how many compute nodes you would like reserved for your job. The node property ppn specifies how many processor cores you need on each compute node. Each compute node in Carter has 16 processor cores. Detailed explanations regarding the distribution of your job across different compute nodes for parallel programs appear in the sections covering specific parallel programming libraries.

        To request 2 compute nodes with 4 processor cores per node:

        $ qsub -l nodes=2:ppn=4 myjobsubmissionfile
        

        Here is a typical list of compute node names from a qsub command requesting 2 compute nodes and 4 processor cores:

        carter-a139
        carter-a139
        carter-a139
        carter-a139
        carter-a138
        carter-a138
        carter-a138
        carter-a138
        

        Note that if you request more than ppn=16 on Carter, your job will never run, because Carter compute nodes only have 16 processor cores each.

        Normally, compute nodes running your job may also be running jobs from other users. ITaP research systems have many processor cores in each compute node, so node sharing allows more efficient use of the system. However, if you have special needs that prohibit others from effectively sharing a compute node with your job, such as needing all of the memory on a compute node, you may request exclusive access to any compute nodes allocated to your job.

        To request exclusive access to a compute node, set ppn to the maximum number of processor cores physically available on a compute node:

        $ qsub -l nodes=1:ppn=16 myjobsubmissionfile
        

        If more convenient, you may also specify any command line options to qsub from within your job submission file, using a special form of comment:

        #!/bin/sh -l
        # FILENAME:  myjobsubmissionfile
        
        #PBS -V
        #PBS -q myqueuename
        #PBS -l nodes=1:ppn=16#PBS -l walltime=01:30:00
        #PBS -N myjobname
        
        # Print the hostname of the compute node on which this job is running.
        /bin/hostname
        

        If an option is present in both your job submission file and on the command line, the option on the command line will take precedence.

        After you submit your job with qsub, it can reside in a queue for minutes, hours, or even weeks. How long it takes for a job to start depends on the specific queue, the number of compute nodes requested, the amount of wall time requested, and what other jobs already waiting in that queue requested as well. It is impossible to say for sure when any given job will start. For best results, request no more resources than your job requires.

        PBS catches only output written to standard output and standard error. Standard output (output normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "o", for example myjobsubmissionfile.o1234, where "1234" represents the PBS job ID. Errors that occurred during the job run and written to standard error (output also normally sent to the screen) will appear in your directory in a file whose extension begins with the letter "e", for example myjobsubmissionfile.e1234. Often, the error file is empty. If your job wrote results to a file, those results will appear in that file.

        Parallel applications may require special care in the selection of PBS resources. Please refer to the sections that follow for details on how to run parallel applications with various parallel libraries.

      • 7.1.5  Job Status

        7.1.5  Job Status

        The command qstat -a will list all jobs currently queued or running and some information about each:

        $ qstat -a
        
        carter-adm.rcac.purdue.edu:
                                                                           Req'd  Req'd   Elap
        Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
        ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
        107025.carter-adm  user123      standby  hello         --    1   8    --  00:05 Q   --
        115505.carter-adm  user456      ncn      job4         5601   1   1    --  600:0 R 575:0
        ...
        189479.carter-adm  user456      standby  AR4b          --    5  40    --  04:00 H   --
        189481.carter-adm  user789      standby  STDIN        1415   1   1    --  00:30 R 00:07
        189483.carter-adm  user789      standby  STDIN        1758   1   1    --  00:30 R 00:07
        189484.carter-adm  user456      standby  AR4b          --    5  40    --  04:00 H   --
        189485.carter-adm  user456      standby  AR4b          --    5  40    --  04:00 Q   --
        189486.carter-adm  user123      tg_workq STDIN         --    1   1    --  12:00 Q   --
        189490.carter-adm  user456      standby  job7        26655   1   8    --  04:00 R 00:06
        189491.carter-adm  user123      standby  job11         --    1   8    --  04:00 Q   --
        

        The status of each job listed appears in the "S" column toward the right. Possible status codes are: "Q" = Queued, "R" = Running, "C" = Completion, and "H" = Held.

        To see only your own jobs, use the -u option to qstat and specify your own username:

        $ qstat -a -u myusername
        
        carter-adm.rcac.purdue.edu:
                                                                           Req'd  Req'd   Elap
        Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
        ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
        182792.carter-adm  myusername   standby  job1        28422   1   4    --  23:00 R 20:19
        185841.carter-adm  myusername   standby  job2        24445   1   4    --  23:00 R 20:19
        185844.carter-adm  myusername   standby  job3        12999   1   4    --  23:00 R 20:18
        185847.carter-adm  myusername   standby  job4        13151   1   4    --  23:00 R 20:18
        

        To retrieve useful information about your queued or running job, use the checkjob command with your job's ID number. The output should look similar to the following:

        $ checkjob -v 163000
        
        job 163000 (RM job '163000.carter-adm.rcac.purdue.edu')
        
        AName: test
        State: Idle
        Creds:  user:myusername  group:mygroup  class:myqueue
        WallTime:   00:00:00 of 20:00:00
        SubmitTime: Wed Apr 18 09:08:37
          (Time Queued  Total: 1:24:36  Eligible: 00:00:23)
        
        NodeMatchPolicy: EXACTNODE
        Total Requested Tasks: 2
        Total Requested Nodes: 1
        
        Req[0]  TaskCount: 2  Partition: ALL
        TasksPerNode: 2  NodeCount:  1
        
        
        Notification Events: JobFail
        
        IWD:            /home/myusername/gaussian
        UMask:          0000
        OutputFile:     carter-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.o163000
        ErrorFile:      carter-fe00.rcac.purdue.edu:/home/myusername/gaussian/test.e163000
        User Specified Partition List:   carter-adm,SHARED
        Partition List: carter-adm
        SrcRM:          carter-adm  DstRM: carter-adm  DstRMJID: 163000.carter-adm.rcac.purdue.edu
        Submit Args:    -l nodes=1:ppn=2,walltime=20:00:00 -q myqueue
        Flags:          RESTARTABLE
        Attr:           checkpoint
        StartPriority:  1000
        PE:             2.00
        NOTE:  job violates constraints for partition carter-adm (job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160))
        
        BLOCK MSG: job 163000 violates active HARD MAXPROC limit of 160 for class myqueue  partition ALL (Req: 2  InUse: 160) (recorded at last scheduling iteration)
        

        There are several useful bits of information in this output.

        • State lets you know if the job is Idle, Running, Completed, or Held.
        • WallTime will show how long the job has run and its maximum time.
        • SubmitTime is when the job was submitted to the cluster.
        • Total Requested Tasks is the total number of cores used for the job.
        • Total Requested Nodes and NodeCount are the number of nodes used for the job.
        • TasksPerNode is the number of cores used per node.
        • IWD is the job's working directory.
        • OutputFile and ErrorFile are the locations of stdout and stderr of the job, respectively.
        • Submit Args will show the arguments given to the qsub command.
        • NOTE/BLOCK MSG will show details on why the job isn't running. The above error says that all the cores are in use on that queue and the job has to wait. Other errors may give insight as to why the job fails to start or is held.

        To view the output of a running job, use the qpeek command with your job's ID number. The -f option will continually output to the screen similar to tail -f, while qpeek without options will just output the whole file so far. Here is an example output from an application:

        $ qpeek -f 1651025
        TIMING: 600  CPU: 97.0045, 0.0926592/step  Wall: 97.0045, 0.0926592/step, 0.11325 hours remaining, 809.902344 MB of memory in use.
        ENERGY:     600    359272.8746    280667.4810     81932.7038      5055.7519       -4509043.9946    383233.0971         0.0000         0.0000    947701.9550       -2451180.1312       298.0766  -3398882.0862  -2442581.9707       298.2890           1125.0475        77.0325  10193721.6822         3.5650         3.0569
        
        TIMING: 800  CPU: 118.002, 0.104987/step  Wall: 118.002, 0.104987/step, 0.122485 hours remaining, 809.902344 MB of memory in use.
        ENERGY:     800    360504.1138    280804.0922     82052.0878      5017.1543       -4511471.5475    383214.3057         0.0000         0.0000    946597.3980       -2453282.3958       297.7292  -3399879.7938  -2444652.9520       298.0805            978.4130        67.0123  10193578.8030        -0.1088         0.2596
        
        TIMING: 1000  CPU: 144.765, 0.133817/step  Wall: 144.765, 0.133817/step, 0.148686 hours remaining, 809.902344 MB of memory in use.
        ENERGY:    1000    361525.2450    280225.2207     81922.0613      5126.4104       -4513315.2802    383460.2355         0.0000         0.0000    947232.8722       -2453823.2352       297.9291  -3401056.1074  -2445219.8163       297.9184            823.8756        43.2552  10193174.7961        -0.7191        -0.2392
        ...
        
      • 7.1.6  Job Hold

        7.1.6  Job Hold

        To place a hold on a job before it starts running, use the qhold command:

        $ qhold myjobid
        

        Once a job has started running it can not be placed on hold.

        To release a hold on a job, use the qrls command:

        $ qrls myjobid
        

        You find the job ID using the qstat command as explained in the PBS Job Status section.

      • 7.1.7  Job Dependencies

        7.1.7  Job Dependencies

        Job dependencies may be configured to ensure jobs start in a specified order. Jobs can be configured to run after other job state changes, such as when the job starts or the job ends.

        These examples illustrate setting dependencies in several ways. Typically dependencies are set by capturing and using the job ID from the last job submitted.

        To run a job after job myjobid has started:

        $ qsub -W depend=after:myjobid myjobsubmissionfile
        

        To run a job after job myjobid ends without error:

        $ qsub -W depend=afterok:myjobid myjobsubmissionfile
        

        To run a job after job myjobid ends with errors:

        $ qsub -W depend=afternotok:myjobid myjobsubmissionfile
        

        To run a job after job myjobid ends with or without errors:

        $ qsub -W depend=afterany:myjobid myjobsubmissionfile
        

        To set more complex dependencies on multiple jobs and conditions:

        $ qsub -W depend=after:myjobid1:myjobid2:myjobid3,afterok:myjobid4 myjobsubmissionfile
        

        Dependencies are an automated way of holding and releasing jobs. Jobs with a dependency are held until the condition is satisfied. Once the condition is satisified jobs only then become eligible to run and must still queue as normal.

      • 7.1.8  Job Cancellation

        7.1.8  Job Cancellation

        To stop a job before it finishes or remove it from a queue, use the qdel command:

        $ qdel myjobid
        

        You find the job ID using the qstat command as explained in the PBS Job Status section.

      • 7.1.9  Examples

        7.1.9  Examples

        To submit jobs successfully, you must understand how to request the right computing resources. This section contains examples of specific types of PBS jobs. These examples illustrate requesting various groupings of nodes and processor cores, using various parallel libraries, and running interactive jobs. You may wish to look here for an example that is most similar to your application and use a modified version of that example's job submission file for your jobs.

        • 7.1.9.1  Batch

          7.1.9.1  Batch

          This simple example submits the job submission file hello.sub to the standby queue on Carter and requests 4 nodes:

          $ qsub -q standby -l nodes=4,walltime=00:01:00 hello.sub
          99.carter-adm.rcac.purdue.edu
          

          Remember that ppn can not be larger than the number of processor cores on each node.

          After your job finishes running, the ls command will show two new files in your directory, the .o and .e files:

          $ ls -l
          hello
          hello.c
          hello.out
          hello.sub
          hello.sub.e99
          hello.sub.o99
          

          If everything went well, then the file hello.sub.e99 will be empty, since it contains any error messages your program gave while running. The file hello.sub.o99 contains the output from your program.

          Using Environment Variables in a Job

          If you would like to see the value of the environment variables from within a PBS job, you can prepare a job submission file with an appropriate filename, here named env.sub:

          #!/bin/sh -l
          # FILENAME:  env.sub
          
          # Request four nodes, 1 processor core on each.
          #PBS -l nodes=4:ppn=1,walltime=00:01:00
          	
          # Change to the directory from which you submitted your job.
          cd $PBS_O_WORKDIR
          	
          # Show details, especially nodes.
          # The results of most of the following commands appear in the error file.
          echo $PBS_O_HOST
          echo $PBS_O_QUEUE
          echo $PBS_O_SYSTEM
          echo $PBS_O_WORKDIR
          echo $PBS_ENVIRONMENT
          echo $PBS_JOBID
          echo $PBS_JOBNAME
          
          # PBS_NODEFILE contains the names of assigned compute nodes.
          cat $PBS_NODEFILE
          

          Submit this job:

          $ qsub env.sub
          
        • 7.1.9.2  Multiple Node

          7.1.9.2  Multiple Node

          This section illustrates various requests for one or multiple compute nodes and ways of allocating the processor cores on these compute nodes. Each example submits a job submission file (myjobsubmissionfile.sub) to a batch session. The job submission file contains a single command cat $PBS_NODEFILE to show the names of the compute node(s) allocated. The list of compute node names indicates the geometry chosen for the job:

          #!/bin/sh -l
          # FILENAME:  myjobsubmissionfile.sub
          
          cat $PBS_NODEFILE
          

          All examples use the default queue of the cluster.

          One processor core on any compute node

          A job shares the other resources, in particular the memory, of the compute node with other jobs. This request is typical of a serial job:

          $ qsub -l nodes=1 myjobsubmissionfile.sub

          Compute node allocated:

          carter-a139

          Two processor cores on any compute nodes

          This request is typical of a distributed-memory (MPI) job:

          $ qsub -l nodes=2 myjobsubmissionfile.sub

          Compute node(s) allocated:

          carter-a139
          carter-a138
          

          All processor cores on one compute node

          The option ppn can not be larger than the number of cores on each compute node on the machine in question. This request is typical of a shared-memory (OpenMP) job:

          $ qsub -l nodes=1:ppn=16 myjobsubmissionfile.sub

          Compute node allocated:

          carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             carter-a137
             

          All processor cores on any two compute nodes

          The option ppn can not be larger than the number of processor cores on each compute node on the machine in question. This request is typical of a hybrid (distributed-memory and shared-memory) job:

          $ qsub -l nodes=2:ppn=16 myjobsubmissionfile.sub

          Compute nodes allocated:

          carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a139
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             carter-a138
             

          Multinode geometry from option nodes is one processor core per node (scattered placement)

          $ qsub -l nodes=8 myjobsubmissionfile.sub
          

          carter-a001
          carter-a003
          carter-a004
          carter-a005
          carter-a006
          carter-a007
          carter-a008
          carter-a009
          

          Multinode geometry from option procs is one or more processor cores per node (free placement)

          $ qsub -l procs=8 myjobsubmissionfile.sub
          

          The placement of processor cores can range from all on one compute node (packed) to all on unique compute nodes (scattered). A few examples follow:

          carter-a001
          carter-a001
          carter-a001
          carter-a001
          carter-a001
          carter-a001
          carter-a001
          carter-a001
          

          carter-a001
          carter-a001
          carter-a001
          carter-a002
          carter-a002
          carter-a003
          carter-a004
          carter-a004
          

          carter-a000
          carter-a001
          carter-a002
          carter-a003
          carter-a004
          carter-a005
          carter-a006
          carter-a007
          

          Four compute nodes, each with two processor cores

          $ qsub -l nodes=4:ppn=2 myjobsubmissionfile.sub
          

          carter-a001
          carter-a001
          carter-a003
          carter-a003
          carter-a004
          carter-a004
          carter-a005
          carter-a005
          

          Eight processor cores can come from any four compute nodes

          $ qsub -l nodes=4 -l procs=8 myjobsubmissionfile.sub
          

          carter-a001
          carter-a001
          carter-a003
          carter-a003
          carter-a004
          carter-a004
          carter-a005
          carter-a005
          

          Exclusive access to one compute node, using one processor core

          Achieving this geometry requires modifying the job submission file, here named myjobsubmissionfile.sub:

          #!/bin/sh -l
          # FILENAME:  myjobsubmissionfile.sub
          
          cat $PBS_NODEFILE
          uniq <$PBS_NODEFILE >nodefile
          echo " "
          cat nodefile
          

          To gain exclusive access to a compute node, specify all processor cores that are physically available on a compute node:

          $ qsub -l nodes=1:ppn=16 myjobsubmissionfile.sub
          

          carter-a005
          carter-a005
          ...
          carter-a005
          
          carter-a005
          

          This request is typical of a serial job that needs access to all of the memory of a compute node.

        • 7.1.9.3  Specific Types of Nodes

          7.1.9.3  Specific Types of Nodes

          You may also request that a job be run on specific nodes based on various quantities such as sub-cluster type, node memory and/or job properties.

          These examples submit a job submission file, here named myjobsubmissionfile.sub, to the default queue. The job submission file contains a single command (cat $PBS_NODEFILE) to show the allocated node(s).

          Example: a job requires a compute node in an "A" sub-cluster:

          $ qsub -l nodes=1:A myjobsubmissionfile.sub 

          Compute node allocated:

          carter-a009

          Example: a job requires a compute node with 32 GB of physical memory:

          $ qsub -l nodes=1:nodemem32gb myjobsubmissionfile.sub 

          Compute node allocated:

          carter-a009

          Example: a job declares that it would require 32 GB of physical memory for itself (and thus needs a node that has more than that):

          $ qsub -l nodes=1,pmem=32gb myjobsubmissionfile.sub 

          Compute node allocated:

          carter-b009

          Note that the pmem=32gb job above does not run on a 32 GB node. Since the operating system requires some memory for itself (possibly about 2 GB, leaving just 30 GB free on a 32 GB node), a pmem=32gb job will not fit into such a node, and PBS will place the job on a larger-memory node. If the requested pmem= value is greater than the free RAM in the largest available node, the job will never start.

          The first two examples above (the A and nodemem32gb keywords) refer to node properties, while the third example above (the pmem=32gb keyword) declares a job property. By using node properties, you can direct your job to the desired node type ("give me a 32 GB node" or "give me a node in sub-cluster A"). Using job properties allows you to state what your job requires and let the scheduler find any node which meets these requirements (i.e. "give me a node that is capable of fitting my 32 GB job"). The former will always go to 32 GB nodes, while the latter may end up on either of 64 or 256 GB nodes, whichever is available.

          Refer to Detailed Hardware Specification section for list of available sub-clusters and their respective per-node memory sizes for the nodemem keyword.

          Important exception: GPU-equipped nodes can not be requested by specifying sub-cluster G. Instead, use the gpus= keyword as further described in details in the running on GPGPU section.

        • 7.1.9.4  Interactive Job

          7.1.9.4  Interactive Job

          Interactive jobs can run on compute nodes. You can start interactive jobs either with specific time constraints (walltime=hh:mm:ss) or with the default time constraints of the queue to which you submit your job.

          If you request an interactive job without a wall time option, PBS assigns to your job the default wall time limit for the queue to which you submit. If this is shorter than the time you actually need, your job will terminate before completion. If, on the other hand, this time is longer than what you actually need, you are effectively withholding computing resources from other users. For this reason, it is best to always pass a reasonable wall time value to PBS for interactive jobs.

          Once your interactive job starts, you may use that connection as an interactive shell and invoke whatever other programs or other commands you wish. To submit an interactive job with one hour of wall time, use the -I option to qsub:

          $ qsub -I -l walltime=01:00:00
          qsub: waiting for job 100.carter-adm.rcac.purdue.edu to start
          qsub: job 100.carter-adm.rcac.purdue.edu ready
          

          If you need to use a remote X11 display from within your job (see the SSH X11 Forwarding Section), add the -v DISPLAY option to qsub as well:

          $ qsub -I -l walltime=01:00:00 -v DISPLAY
          qsub: waiting for job 101.carter-adm.rcac.purdue.edu to start
          qsub: job 101.carter-adm.rcac.purdue.edu ready
          

          To quit your interactive job:

          logout
          
        • 7.1.9.5  Serial

          7.1.9.5  Serial

          A serial job is a single process whose steps execute as a sequential stream of instructions on one processor core.

          This section illustrates how to use PBS to submit to a batch session one of the serial programs compiled in the section Compiling Serial Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

          Suppose that you named your executable file serial_hello. Prepare a job submission file with an appropriate filename, here named serial_hello.sub:

          #!/bin/sh -l
          # FILENAME:  serial_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          
          ./serial_hello
          

          Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

          Submit the serial job to the default queue on Carter and request 1 compute node with 1 processor core and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster:

          $ qsub -l nodes=1:ppn=1,walltime=00:01:00 ./serial_hello.sub
          

          View two new files in your directory (.o and .e):

          $ ls -l
          serial_hello
          serial_hello.c
          serial_hello.sub
          serial_hello.sub.emyjobid
          serial_hello.sub.omyjobid
          

          View results in the output file:

          $ cat serial_hello.sub.omyjobid
          Runhost:carter-a139.rcac.purdue.edu   hello, world
          

          If the job failed to run, then view error messages in the file serial_hello.sub.emyjobid.

          If a serial job uses a lot of memory and finds the memory of a compute node overcommitted while sharing the compute node with other jobs, specify the number of processor cores physically available on the compute node to gain exclusive use of the compute node:

          $ qsub -l nodes=1:ppn=16,walltime=00:01:00 serial_hello.sub
          

          View results in the output file:

          $ cat serial_hello.sub.omyjobid
          Runhost:carter-a139.rcac.purdue.edu   hello, world
          
        • 7.1.9.6  MPI

          7.1.9.6  MPI

          A message-passing job is a set of processes (often multiple copies of a single process) that take advantage of distributed-memory systems by communicating with each other via the sending and receiving of messages. Work occurs across several compute nodes of a distributed-memory system. The Message-Passing Interface (MPI) is a specific implementation of the message-passing model and is a collection of library functions. OpenMPI , MVAPICH2 , and Intel MPI (IMPI) are implementations of the MPI standard.

          This section illustrates how to use PBS to submit to a batch session one of the MPI programs compiled in the section Compiling MPI Programs. There is no difference in running a Fortran, C, or C++ serial program after compiling and linking it into an executable file.

          The path to relevant MPI libraries is not set up on any run host by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Carter, including MPI library packages. Then, to use one of the available MPI modules, enter the module load command.

          Suppose that you named your executable file mpi_hello. Prepare a job submission file with an appropriate filename, here named mpi_hello.sub:

          #!/bin/sh -l
          # FILENAME:  mpi_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          
          mpiexec -n 32 ./mpi_hello
          

          You can load any MPI library/compiler module that is available on Carter (This example uses the OpenMPI library).

          Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

          You invoke an MPI program with the mpiexec command. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application and should typically be equal to the total number of processor cores you request from PBS (more on this below).

          Submit the MPI job to the default queue on Carter and request 2 compute nodes with all 16 processor cores and 16 MPI ranks on each compute node and 1 minute of wall time. This will use two complete compute nodes of the Carter cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

          $ qsub -l nodes=2:ppn=16,walltime=00:01:00 ./mpi_hello.sub
          

          View two new files in your directory (.o and .e):

          $ ls -l
          mpi_hello
          mpi_hello.c
          mpi_hello.sub
          mpi_hello.sub.emyjobid
          mpi_hello.sub.omyjobid
          

          View results in the output file:

          $ cat mpi_hello.sub.omyjobid
          Runhost:carter-a010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
                   Runhost:carter-a010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
                      ...
                   

          If the job failed to run, then view error messages in the file mpi_hello.sub.emyjobid.

          If an MPI job uses a lot of memory and 16 MPI ranks per compute node use all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores on each compute node, while keeping the total number of MPI ranks unchanged.

          Submit the job to the default queue with double the number of compute nodes and half the number of MPI ranks per compute node (the total number of MPI ranks remains unchanged). Use the -n exclusive flag to ensure no other jobs will share your nodes and use the extra memory required by your job.

          $ qsub -l nodes=4:ppn=8,walltime=00:01:00 -n ./mpi_hello.sub
          

          View results in the output file:

          $ cat mpi_hello.sub.omyjobid
          Runhost:carter-c010.rcac.purdue.edu   Rank:0 of 32 ranks   hello, world
                   Runhost:carter-c010.rcac.purdue.edu   Rank:1 of 32 ranks   hello, world
                      ...
                   Runhost:carter-c011.rcac.purdue.edu   Rank:7 of 32 ranks   hello, world
                   Runhost:carter-c011.rcac.purdue.edu   Rank:8 of 32 ranks   hello, world
                      ...
                   Runhost:carter-c012.rcac.purdue.edu   Rank:14 of 32 ranks   hello, world
                   Runhost:carter-c012.rcac.purdue.edu   Rank:15 of 32 ranks   hello, world
                      ...
                   

          Notes

          • In general, the exact order in which MPI ranks output similar write requests to an output file is random.
          • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Carter is "standby".
          • Invoking an MPI program on Carter with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke an MPI program.

          For an introductory tutorial on how to write your own MPI programs:

        • 7.1.9.7  OpenMP

          7.1.9.7  OpenMP

          A shared-memory job is a single process that takes advantage of a multi-core processor and its shared memory to achieve a form of parallel computing called multithreading. It distributes the work of a process over several processor cores of a multi-core processor. Open Multi-Processing (OpenMP) is a specific implementation of the shared-memory model and is a collection of parallelization directives, library routines, and environment variables.

          This section illustrates how to use PBS to submit to a batch session one of the OpenMP programs, either task parallelism or loop-level (data) parallelism, compiled in the section Compiling OpenMP Programs. There is no difference in running a Fortran, C, or C++ OpenMP program after compiling and linking it into an executable file.

          When running OpenMP programs, all threads must be on the same compute node to take advantage of shared memory. The threads cannot communicate between nodes.

          To run an OpenMP program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

          In csh:

          $ setenv OMP_NUM_THREADS mynumberofthreads
          

          In bash:

          $ export OMP_NUM_THREADS=mynumberofthreads
          

          Suppose that you named your executable file omp_hello. Prepare a job submission file with an appropriate name, here named omp_hello.sub:

          #!/bin/sh -l
          # FILENAME:  omp_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          export OMP_NUM_THREADS=16./omp_hello
          

          Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the program.

          Submit the OpenMP job to the default queue on Carter and request 1 complete compute node with all 16 processor cores (OpenMP threads) on the compute node and 1 minute of wall time. This will use one complete compute node of the Carter cluster. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

          $ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_hello.sub
          

          View two new files in your directory (.o and .e):

          $ ls -l
          omp_hello
          omp_hello.c
          omp_hello.sub
          omp_hello.sub.emyjobid
          omp_hello.sub.omyjobid
          

          View the results from one of the sample OpenMP programs about task parallelism:

          $ cat omp_hello.sub.omyjobid
          SERIAL REGION:     Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
          PARALLEL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 16 threads   hello, world
                PARALLEL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:1 of 16 threads   hello, world
                   ...
                SERIAL REGION:     Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
          

          If the job failed to run, then view error messages in the file omp_hello.sub.emyjobid.

          If an OpenMP program uses a lot of memory and 16 threads use all of the memory of the compute node, use fewer processor cores (OpenMP threads) on that compute node.

          Modify the job submission file omp_hello.sub to use half the number of processor cores:

          #!/bin/sh -l
          # FILENAME:  omp_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          export OMP_NUM_THREADS=8./omp_hello
             

          Submit the job to the default queue. Be sure to request the whole node or other jobs may use the extra memory your job requires.

          $ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_hello.sub
             

          View the results from one of the sample OpenMP programs about task parallelism and using half the number of processor cores:

          $ cat omp_hello.sub.omyjobid
          
          SERIAL REGION:     Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
             PARALLEL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 8 threads   hello, world
                   PARALLEL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:1 of 8 threads   hello, world
                      ...
                   SERIAL REGION:     Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
             

          Practice submitting the sample OpenMP program about loop-level (data) parallelism:

          #!/bin/sh -l
          # FILENAME:  omp_loop.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          export OMP_NUM_THREADS=16./omp_loop
          

          $ qsub -l nodes=1:ppn=16,walltime=00:01:00 omp_loop.sub
          

          SERIAL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
          PARALLEL LOOP:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 16 threads   Iteration:0  hello, world
          PARALLEL LOOP:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 16 threads   Iteration:1  hello, world
                PARALLEL LOOP:   Runhost:carter-c044.rcac.purdue.edu   Thread:1 of 16 threads   Iteration:2  hello, world
          PARALLEL LOOP:   Runhost:carter-c044.rcac.purdue.edu   Thread:1 of 16 threads   Iteration:3  hello, world
                   ...
                SERIAL REGION:   Runhost:carter-c044.rcac.purdue.edu   Thread:0 of 1 thread    hello, world
          
        • 7.1.9.8  Hybrid

          7.1.9.8  Hybrid

          A hybrid job combines both message-passing and shared-memory attributes to take advantage of distributed-memory systems with multi-core processors. Work occurs across several compute nodes of a distributed-memory system and across the processor cores of the multi-core processors.

          This section illustrates how to use PBS to submit to a batch session one of the hybrid programs compiled in the section Compiling Hybrid Programs. There is no difference in running a Fortran, C, or C++ hybrid program after compiling and linking it into an executable file.

          The path to relevant MPI libraries is not set up on any compute node by default. Using module load is the preferred way to access these libraries. Use module avail to see all software packages installed on Carter, including MPI library packages. Then, to use one of the available MPI modules, enter the module load command.

          When running hybrid programs, use all processor cores of the compute nodes to take advantage of shared memory.

          To run a hybrid program, set the environment variable OMP_NUM_THREADS to the desired number of threads:

          In csh:

          $ setenv OMP_NUM_THREADS mynumberofthreads
          

          In bash:

          $ export OMP_NUM_THREADS=mynumberofthreads
          

          Suppose that you named your executable file hybrid_hello. Prepare a job submission file with an appropriate filename, here named hybrid_hello.sub:

          #!/bin/sh -l
          # FILENAME:  hybrid_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          uniq <$PBS_NODEFILE >nodefile
          export OMP_NUM_THREADS=16mpiexec -n 2 -machinefile nodefile ./hybrid_hello
          

          You can load any MPI library/compiler module that is available on Carter. This example uses the OpenMPI library.

          Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the job's run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the executable program.

          You invoke a hybrid program with the mpiexec command. You may need to specify how to place the threads on the compute node. Several examples on how to specify thread placement with various MPI libraries are at the bottom of this section. The number of processes requested with mpiexec -n is usually equal to the number of MPI ranks of the application (more on this below).

          Submit the hybrid job to the default queue on Carter and request 2 whole compute nodes with 1 MPI rank and all 16 processor cores (OpenMP threads) on each compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it. Job completion can take a while depending on the demand placed on the compute cluster.

          $ qsub -l nodes=2:ppn=16,walltime=00:01:00 hybrid_hello.sub
          179168.carter-adm.rcac.purdue.edu
          

          View two new files in your directory (.o and .e):

          $ ls -l
          hybrid_hello
          hybrid_hello.c
          hybrid_hello.sub
          hybrid_hello.sub.emyjobid
          hybrid_hello.sub.omyjobid
          

          View the results from one of the sample hybrid programs about task parallelism:

          $ cat hybrid_hello.sub.omyjobid
          SERIAL REGION:     Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
             PARALLEL REGION:   Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 16 threads   hello, world         PARALLEL REGION:   Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 16 threads   hello, world            ...
                   SERIAL REGION:     Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world
             

          If the job failed to run, then view error messages in the file hybrid_hello.sub.emyjobid.

          If a hybrid job uses a lot of memory and 16 OpenMP threads per compute node uses all of the memory of the compute nodes, request more compute nodes (MPI ranks) and use fewer processor cores (OpenMP threads) on each compute node.

          Prepare a job submission file with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

          #!/bin/sh -l
          # FILENAME:  hybrid_hello.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          uniq <$PBS_NODEFILE >nodefile
          export OMP_NUM_THREADS=8mpiexec -n 4 -machinefile nodefile ./hybrid_hello
             

          Submit the job to the default queue on Carter with double the number of compute nodes (MPI ranks). Be sure to request the whole node or other jobs may use the extra memory your job requires.

          $ qsub -l nodes=4:ppn=16,walltime=00:01:00 hybrid_hello.sub   

          View the results from one of the sample hybrid programs about task parallelism with double the number of compute nodes (MPI ranks) and half the number of processor cores (OpenMP threads):

          $ cat hybrid_hello.sub.omyjobid
             SERIAL REGION:     Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world
                PARALLEL REGION:   Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 8 threads   hello, world            PARALLEL REGION:   Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:1 of 8 threads   hello, world               ...
                      SERIAL REGION:     Runhost:carter-a020.rcac.purdue.edu   Rank:0 of 4 ranks, Thread:0 of 1 thread    hello, world
                SERIAL REGION:     Runhost:carter-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world
                PARALLEL REGION:   Runhost:carter-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 8 threads   hello, world            PARALLEL REGION:   Runhost:carter-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:1 of 8 threads   hello, world               ...
                      SERIAL REGION:     Runhost:carter-a021.rcac.purdue.edu   Rank:1 of 4 ranks, Thread:0 of 1 thread    hello, world
                SERIAL REGION:     Runhost:carter-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world
                PARALLEL REGION:   Runhost:carter-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 8 threads   hello, world            PARALLEL REGION:   Runhost:carter-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:1 of 8 threads   hello, world               ...
                      SERIAL REGION:     Runhost:carter-a022.rcac.purdue.edu   Rank:2 of 4 ranks, Thread:0 of 1 thread    hello, world
                   

          Practice submitting the sample OpenMP program about loop-level (data) parallelism:

          #!/bin/sh -l
          # FILENAME:  hybrid_loop.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          uniq <$PBS_NODEFILE >nodefile
          export OMP_NUM_THREADS=16mpiexec -n 2 -machinefile nodefile ./hybrid_loop
          

          $ qsub -l nodes=2:ppn=16,walltime=00:01:00 hybrid_loop.sub
          

          SERIAL REGION:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world   PARALLEL LOOP:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 16 threads   Iteration:0   hello, worldPARALLEL LOOP:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 16 threads   Iteration:1   hello, world         PARALLEL LOOP:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 16 threads   Iteration:2   hello, worldPARALLEL LOOP:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:1 of 16 threads   Iteration:3   hello, world            ...
                   SERIAL REGION:   Runhost:carter-a044.rcac.purdue.edu   Rank:0 of 2 ranks, Thread:0 of 1 thread    hello, world   

          Thread placement

          Compute nodes are made up of two or more processor chips, or sockets. Typically each socket shares a memory controller and communication busses for all of its cores. Consider these cores as having "shortcuts" to each other. Cores within a socket will be able to communicate faster and more efficiently amongst themselves than with another socket or compute node. MPI ranks should consequently be placed so that they can utilize these "shortcuts". When running hybrid codes it is essential to specify this placement as by default some MPI libraries will limit a rank to a single core or may scatter a rank across processor chips.

          Below are examples on how to specify this placement with several MPI libraries. Hybrid codes should be run within jobs requesting the entire job by either using ppn=16 or the -n exclusive flag or the job may result in unexpected and poor thread placement.

          OpenMPI 1.6.3

          mpiexec -cpus-per-rank $OMP_NUM_THREADS --bycore -np 2 -machinefile nodefile ./hybrid_loop

          OpenMPI 1.8

          mpiexec -map-by socket:pe=$OMP_NUM_THREADS -np 2 -machinefile nodefile ./hybrid_loop

          Intel MPI

          mpiexec -np 2 -machinefile nodefile ./hybrid_loop

          MVAPICH2

          mpiexec -env MV2_ENABLE_AFFINITY 0 -np 2 -machinefile nodefile ./hybrid_loop

          Notes

          • In general, the exact order in which MPI processes of a hybrid program output similar write requests to an output file is random.
          • Use qlist to determine which queues are available to you. The name of the queue which is available to everyone on Carter is "standby".
          • Invoking a hybrid program on Carter with ./program is typically wrong, since this will use only one MPI process and defeat the purpose of using MPI. Unless that is what you want (rarely the case), you should use mpiexec to invoke a hybrid program.
        • 7.1.9.9  GPGPU

          7.1.9.9  GPGPU

          The Carter cluster has twelve nodes each with three NVIDIA Tesla M2090 GPUs. Each Tesla GPU contains 512 stream processors and 6GB of memory. NVIDIA cards support API extensions, such as CUDA and OpenCL, to many programming languages including C, C++, and Fortran.

          This section illustrates how to use PBS to submit to a batch session a simple GPU program. There is no difference in running a Fortran, C, or C++ GPU program after compiling and linking it into an executable file.

          Suppose that you named your executable file gpu_hello from the sample code gpu_hello.cu. Prepare a job submission file with an appropriate name, here named gpu_hello.sub:

          #!/bin/sh -l
          # FILENAME:  gpu_hello.sub
          
          module load cuda
          
          cd $PBS_O_WORKDIR
          
          host=`hostname -s`
          gpus=`cat $PBS_GPUFILE | grep $host | cut -d'-' -f3 | cut -c4 | sort`
          export CUDA_VISIBLE_DEVICES=`echo $gpus | tr ' ' ','`
          
          ./gpu_hello
          

          Since PBS always sets the working directory to your home directory, you should either execute the cd $PBS_O_WORKDIR command, which will set the run-time current working directory to the directory from which you submitted the job submission file via the qsub command, or give the full path to the directory containing the program.

          The PBS system provides several mechanisms to aid in the request, allocation, and use of GPUs on a compute node. The option gpus is used to select the desired number of GPUs per compute node. On Carter GPU-enabled nodes, up to three GPUs can be selected per compute node. The gpus option operates very similarly to the ppn option used to select the desired number of processors cores per compute node. The option gpus can not be larger than the number of GPUs in the compute nodes.

          During job run-time, PBS sets a environment variable $PBS_GPUFILE that contains a file listing the GPUs allocated to this job. This file is very similar to the $PBS_NODEFILE environment variable. The $PBS_GPUFILE variable will only be set if the gpus option is provided with job submission. Use of this option is recommended to avoid accidentally running on GPUs occupied by other users. More detailed information on using allocated GPUs can be found in this example code and below.

          Submit the GPU job to a GPU-enabled queue on Carter, such as standby-g, and request one compute node, one CPU core, and one GPU with one minute of wall time. Job completion can take a while depending on the demand placed on the GPU-enabled compute nodes. GPU-enabled compute nodes are not available from the default standby queue.

          $ qsub -q standby-g -l nodes=1:gpus=1,walltime=00:01:00 gpu_hello.sub
          

          View two new files in your directory (.o and .e):

          $ ls -l
          gpu_hello
          gpu_hello.cu
          gpu_hello.sub
          gpu_hello.sub.emyjobid
          gpu_hello.sub.omyjobid
          

          View results in the file for all standard output, gpu_hello.sub.omyjobid

          hello, world
          

          If the job failed to run, then view error messages in the file gpu_hello.sub.emyjobid.

          A few examples of GPU job submission and GPU allocation follow:

          One compute node with one GPU:

          $ qsub -q standby-g -l nodes=1:gpus=1 myjobsubmissionfile.sub
          
          carter-g000-gpu0
          

          One compute node with three GPUs:

          $ qsub -q standby-g -l nodes=1:gpus=3 myjobsubmissionfile.sub
          
          carter-g000-gpu2
          carter-g000-gpu1
          carter-g000-gpu0
          

          Three compute nodes with one GPU each:

          $ qsub -q standby-g -l nodes=3:gpus=1 myjobsubmissionfile.sub
          
          carter-g000-gpu0
          carter-g001-gpu0
          carter-g002-gpu0
          

          To select which CUDA device to use with a CUDA C program, use the cudaSetDevice( int device ) API call to set the device. All subsequent CUDA memory allocations or kernel launches will be performed on this device. This example takes the device number as the first command line argument:

          if (cudaSetDevice(atoi(argv[1])) != cudaSuccess) {
              int num_devices;
              cudaGetDeviceCount(&num_devices);
              fprintf(stderr, "Error initializing device %s, device value must be 0-%d\n", argv[1], (num_devices-1));
              return 0;
          }
          

          The value to pass to your program can be simplistically determined from the batch submission script:

          host=`hostname -s`
          gpus=`cat $PBS_GPUFILE | grep $host | cut -d'-' -f3 | cut -c4 | sort`
          export CUDA_VISIBLE_DEVICES=`echo $gpus | tr ' ' ','`
          ./gpu_hello
          

          Using multiple GPUs within a program will require more complex processing of the $PBS_GPUFILE file.

        • 7.1.9.10  Scratch File

          7.1.9.10  Scratch File

          Some applications process data stored in a large input data file. The size of this file may be so large that it cannot fit within the quota of a home directory. This file might reside on Fortress or some other external storage medium. The way to process this file on Carter is to copy it to your scratch directory where a job running on a compute node of Carter may access it.

          This section illustrates how to submit a small job which reads a data file which resides on the scratch file system. This example, myprogram.c, displays the name of the compute node which runs the job, the path name of the current working directory, the contents of that directory, and copies the contents of an input scratch file to an output scratch file. Linux commands access system information. To compile this program, see Compiling Serial Programs.

          Prepare a scratch file directory with a large input data file:

          $ ls -l $RCAC_SCRATCH
          total 96
          -rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile
          

          Prepare a job submission file with the path to your scratch file directory listed as a command-line argument and with an appropriate filename, here named myjob.sub:

          #!/bin/sh -l
          # FILENAME:  myjob.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          
          ./myprogram $RCAC_SCRATCH
          

          Submit this job to the default queue on Carter and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it.

          $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
          

          View two new files in the home directory (.o and .e):

          $ ls -l
          total 160
          -rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
          -rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
          -rw------- 1 myusername itap    0 Jun  8 11:05 myjob.sub.e266283
          -rw------- 1 myusername itap  780 Jun  8 11:05 myjob.sub.o266283
          -rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram*
          -rw-r--r-- 1 myusername itap 3930 Jun  8 11:13 myprogram.c
          

          View one new file in the scratch file directory, bigoutputdatafile:

          $ ls -l $RCAC_SCRATCH
          total 96
          -rw-r----- 1 myusername itap   27 Jun  8 10:41 mybiginputdatafile
          -rw-r--r-- 1 myusername itap   42 Jun  8 11:05 mybigoutputdatafile
          

          View results in the output file:

          $ cat myjob.sub.o266283
          Warning: no access to tty (Bad file descriptor).
          Thus no job control in this shell.
          carter-d036.rcac.purdue.edu
          /home/myusername
          total 128
          -rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
          -rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
          -rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
          -rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
          total 128
          -rw-r--r-- 1 myusername itap   54 Jun  8 10:29 README
          -rw-r--r-- 1 myusername itap  136 Jun  8 11:04 myjob.sub
          -rwxr-xr-x 1 myusername itap 9526 Jun  8 11:04 myprogram
          -rw-r--r-- 1 myusername itap 3976 Jun  8 10:45 myprogram.c
          ***  MAIN START  ***
          
          input scratch file:   /scratch/carter/m/myusername/mybiginputdatafile
          output scratch file:  /scratch/carter/m/myusername/mybigoutputdatafile
          scratch file system:  textfromscratchfile
          
          ***  MAIN  STOP  ***
          

          The output shows the name of the compute node which PBS chose to run the job, the path of the current working directory (the user's home directory), before-and-after listings of the content of the current working directory, and output from the application. The output scratch file named mybigoutdatafile, the primary output of this program, appears in the scratch directory, not the home directory.

        • 7.1.9.11  /tmp File

          7.1.9.11  /tmp File

          Some applications write a large amount of intermediate data to a temporary file during an early part of the process then read that data for further processing during a later part of the process. The size of this file may be so large that it cannot fit within the quota of a home directory or that it requires too much I/O activity between the compute node and either the home directory or the scratch file directory. The way to process this intermediate file on Carter is to use the /tmp directory of the compute node which runs the job. Used properly, /tmp may provide faster local storage to an active process than any other storage option.

          This section illustrates how to submit a small job which first writes then reads an intermediate data file which resides on the /tmp directory. This example, myprogram.c, displays the contents of the /tmp directory before and after processing. Linux commands access system information. To compile this program, see Compiling Serial Programs.

          Prepare a job submission file with an appropriate filename, here named myjob.sub:

          #!/bin/sh -l
          # FILENAME:  myjob.sub
          
          module load devel
          cd $PBS_O_WORKDIR
          
          ./myprogram
          

          Submit this job to the default queue on Carter and request 1 processor core of 1 compute node and 1 minute of wall time. Requesting the default queue does not require explicitly asking for it:

          $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
          

          View results in the output file, myjob.sub.omyjobid:

          Warning: no access to tty (Bad file descriptor).
          Thus no job control in this shell.
          -rw-r--r-- 1 myusername itap 12 Jun 16 11:36 /tmp/mytmpfile
          ***  MAIN START  ***
          
          /tmp file data:  abcdefghijk
          
          ***  MAIN  STOP  ***
          

          The output verifies the existence of the intermediate data file in the /tmp directory.

          View results in the error file, myjob.sub.emyjobid:

          ls: /tmp/mytmpfile: No such file or directory
          

          The results in the error file verify that the intermediate data file does not exist at the start of processing.

          While the /tmp directory can provide faster local storage to an active process than other storage options, you never know how much storage is available in the /tmp directory of the compute node chosen to run your job. If an intermediate data file consistently fails to fit in the /tmp directories of a set of compute nodes, consider limiting the pool of candidate compute nodes to those which can handle your intermediate data file.

        • 7.1.9.12  InfiniBand

          7.1.9.12  InfiniBand

          InfiniBand is available on Carter. To make use of InfiniBand, you must use openmpi or mvapich2. To load any of these, use module avail (with openmpi or mvapich2) to find an appropriate version and use module load, for example module load openmpi/1.4.4_intel-12.0.084.

        • 7.1.9.13  Commercial and Third-Party Applications

          7.1.9.13  Commercial and Third-Party Applications

          Several commercial and third-party software packages are available on Carter and accessible through PBS.

          We try to continually test the examples in the next few sectionss, but you may find some differences. If you need assistance, please contact us.

          With the exception of Octave and R, which are free software, only Purdue affiliates may use the following licensed software.

          • 7.1.9.13.1  Gaussian

            7.1.9.13.1  Gaussian

            Gaussian is a computational chemistry software package which works on electronic structure. This section illustrates how to submit a small Gaussian job to a PBS queue. This Gaussian example runs the Fletcher-Powell multivariable optimization.

            Prepare a Gaussian input file with an appropriate filename, here named myjob.com. The final blank line is necessary:

            #P TEST OPT=FP STO-3G OPTCYC=2
            
            STO-3G FLETCHER-POWELL OPTIMIZATION OF WATER
            
            0 1
            O
            H 1 R
            H 1 R 2 A
            
            R 0.96
            A 104.
            
            

            To submit this job, load Gaussian then run the provided script, named subg09. This job uses one compute node with 8 processor cores:

            $ module load gaussian09
            $ subg09 myjob -l nodes=1:ppn=8
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for Gaussian output, here named myjob.log. Only the first and last few lines appear here:

             Entering Gaussian System, Link 0=/apps/rhel5/g09-B.01/g09/g09
             Initial command:
             /apps/rhel5/g09-B.01/g09/l1.exe /scratch/carter/m/myusername/gaussian/Gau-7781.inp -scrdir=/scratch/carter/m/myusername/gaussian/
             Entering Link 1 = /apps/rhel5/g09-B.01/g09/l1.exe PID=      7782.
            
             Copyright (c) 1988,1990,1992,1993,1995,1998,2003,2009,2010,
                        Gaussian, Inc.  All Rights Reserved.
            
            .
            .
            .
            
             Job cpu time:  0 days  0 hours  1 minutes 37.3 seconds.
             File lengths (MBytes):  RWF=      5 Int=      0 D2E=      0 Chk=      1 Scr=      1
             Normal termination of Gaussian 09 at Wed Mar 30 10:49:02 2011.
            real 17.11
            user 92.40
            sys 4.97
            Machine:
            carter-a389
            carter-a389
            carter-a389
            carter-a389
            carter-a389
            carter-a389
            carter-a389
            carter-a389
            

            Examples of Gaussian PBS Job Submissions

            Submit job using 4 processor cores on a single node:

            $ subg09 myjob -l nodes=1:ppn=4,walltime=200:00:00 -q myqueuename
               

            Submit job using 4 processor cores on each of 2 nodes:

            $ subg09 myjob -l nodes=2:ppn=4,walltime=200:00:00 -q myqueuename
               

            Submit job using 8 processor cores on a single node:

            $ subg09 myjob -l nodes=1:ppn=8,walltime=200:00:00 -q myqueuename
               

            Submit job using 8 processor cores on each of 2 nodes:

            $ subg09 myjob -l nodes=2:ppn=8,walltime=200:00:00 -q myqueuename
               

            For more information about Gaussian:

          • 7.1.9.13.2  Maple

            7.1.9.13.2  Maple

            Maple is a general-purpose computer algebra system. This section illustrates how to submit a small Maple job to a PBS queue. This Maple example differentiates, integrates, and finds the roots of polynomials.

            Prepare a Maple input file with an appropriate filename, here named myjob.in:

            # FILENAME:  myjob.in
            
            # Differentiate wrt x.
            diff( 2*x^3,x );
            
            # Integrate wrt x.
            int( 3*x^2*sin(x)+x,x );
            
            # Solve for x.
            solve( 3*x^2+2*x-1,x );
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load maple
            cd $PBS_O_WORKDIR
            
            # Use the -q option to suppress startup messages.
            # maple -q myjob.in
            maple myjob.in
            

            OR:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load maple
            
            # Use the -q option to suppress startup messages.
            # maple -q << EOF
            maple << EOF
            
            # Differentiate wrt x.
            diff( 2*x^3,x );
            
            # Integrate wrt x.
            int( 3*x^2*sin(x)+x,x );
            
            # Solve for x.
            solve( 3*x^2+2*x-1,x );
            EOF
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, here named myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
                                                     2
                                                  6 x
            
                                                                       2
                                  2                                   x
                              -3 x  cos(x) + 6 cos(x) + 6 x sin(x) + ----
                                                                      2
            
                                                1/3, -1
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about Maple:

          • 7.1.9.13.3  Mathematica

            7.1.9.13.3  Mathematica

            Mathematica implements numeric and symbolic mathematics. This section illustrates how to submit a small Mathematica job to a PBS queue. This Mathematica example finds the three roots of a third-degree polynomial.

            Prepare a Mathematica input file with an appropriate filename, here named myjob.in:

            (* FILENAME:  myjob.in *)
            
            (* Find roots of a polynomial. *)
            p=x^3+3*x^2+3*x+1
            Solve[p==0]
            Quit
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load mathematica
            cd $PBS_O_WORKDIR
            
            math < myjob.in
            

            OR:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load mathematica
            math << EOF
            
            (* Find roots of a polynomial. *)
            p=x^3+3*x^2+3*x+1
            Solve[p==0]
            Quit
            EOF
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, here named myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            Mathematica 5.2 for Linux x86 (64 bit)
            Copyright 1988-2005 Wolfram Research, Inc.
             -- Terminal graphics initialized --
            
            In[1]:=
            In[2]:=
            In[2]:=
            In[3]:=
                                 2    3
            Out[3]= 1 + 3 x + 3 x  + x
            
            In[4]:=
            Out[4]= {{x -> -1}, {x -> -1}, {x -> -1}}
            
            In[5]:=
            

            View the standard error file, myjob.sub.emyjobid:

            rmdir: ./ligo/rengel/tasks: Directory not empty
            rmdir: ./ligo/rengel: Directory not empty
            rmdir: ./ligo: Directory not empty
            

            For more information about Mathematica:

          • 7.1.9.13.4  MATLAB

            7.1.9.13.4  MATLAB

            MATLAB (MATrix LABoratory) is a high-level language and interactive environment for numerical computation, visualization, and programming. You can use MATLAB for a range of applications, including signal processing and communications, image and video processing, control systems, test and measurement, computational finance, and computational biology. MATLAB is a product of the MathWorks.

            MATLAB, Simulink, Compiler, and several of the optional toolboxes are available to faculty, staff, and students. To see the kind and quantity of all MATLAB licenses plus the number that you are currently using you can use the matlab_licenses command:

            $ module load matlab
            $ matlab_licenses
                                                              Licenses
            MATLAB Product / Toolbox Name            myusername    Free    Total
            ==================================      ===========================
            Aerospace Blockset                              0       10       10
            Aerospace Toolbox                               0       20       20
            Bioinformatics Toolbox                          0       18       20
            Communication Toolbox                           0       17       30
            Compiler                                        0       15       15
            Control Toolbox                                 0       67       75
            Curve Fitting Toolbox                           0       51       95
            Data Acq Toolbox                                0       10       10
            Database Toolbox                                0        5        5
            Datafeed Toolbox                                0        5        5
            Dial and Gauge Blocks                           0       25       25
            Econometrics Toolbox                            0       11       15
            Excel Link                                      0        5        5
            Financial Toolbox                               0       14       15
            Fixed-Point Blocks                              0        5        5
            Fixed Point Toolbox                             0       20       20
            Fuzzy Toolbox                                   0       10       10
            GADS Toolbox                                    0       11       15
            Identification Toolbox                          0       15       15
            Image Acquisition Toolbox                       0        5        5
            Image Toolbox                                   0       81      120
            Instr Control Toolbox                           0       12       25
            MAP Toolbox                                     0       21       30
            MATLAB                                          0      450    1,000
            MATLAB Builder for dot Net                      0        1        1
            MATLAB Builder for Java                         0        0        1
            MATLAB Coder                                    0       27       35
            MATLAB Distrib Comp Server                      0      256      256
            MATLAB Excel Builder                            0        0        1
            MATLAB Report Gen                               0        2        2
            MBC Toolbox                                     0        5        5
            MPC Toolbox                                     0        5        5
            Neural Network Toolbox                          0       14       15
            OPC Toolbox                                     0        1        1
            Optimization Toolbox                            0       76      125
            Parallel Computing Toolbox                      0       38       50
            PDE Toolbox                                     0       15       15
            Power System Blocks                             0       26       30
            Real-Time Win Target                            0       10       17
            Real-Time Workshop                              0       12       35
            RF Toolbox                                      0        0        1
            Robust Toolbox                                  0        4        5
            RTW Embedded Coder                              0       15       15
            Signal Blocks                                   0       27       30
            Signal Toolbox                                  0       65      100
            SimBiology                                      0        5        5
            SimHydraulics                                   0       15       15
            SimMechanics                                    0        5        5
            Simscape                                        0       22       30
            SIMULINK                                        0       78      100
            Simulink Control Design                         0       15       15
            Simulink Design Optim                           0        5        5
            SIMULINK Report Gen                             0        2        2
            SL Verification Validation                      0        4        5
            Stateflow                                       0       14       15
            Statistics Toolbox                              0       19      120
            Symbolic Toolbox                                0       59       75
            Virtual Reality Toolbox                         0        5        5
            Wavelet Toolbox                                 0       14       15
            XPC Target                                      0       19       20
            

            The table shows a list of MATLAB toolboxes available at Purdue, the number of licenses that you are currently using, a snapshot of the number of licenses currently free, and the total number of licenses which Purdue owns for each product. To limit the output to only the toolboxes your jobs are currently using, you can use the -u flag.

            The MATLAB client can be run in the front-end for application development, however, computationally intensive jobs must be run on compute nodes. This means using the MATLAB function batch(), or running your MATLAB client on a compute node through the PBS scheduler.

            MATLAB distinguishes three types of parallel jobs: distributed, matlabpool, and parallel. A distributed job is one or more independent, single-processor-core tasks of MATLAB statements, also known as a embarrassingly parallel job.

            A pool job follows a master/worker model, in which one worker distributes and oversees the work accomplished by the rest of the worker pool. A pool job can also implement codistributed arrays as a means of handling data arrays which are too large to fit into the memory of any one compute node.

            A parallel job is a single task running concurrently on two or more processor cores. The copies of the task are not independent; they may interact with each other. A parallel job is also known as a data-parallel job.

            MATLAB also offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager. In addition, MATLAB offers implicit parallelism by default in the form of thread-parallel enabled functions.

            The following sections provide several examples illustrating how to submit MATLAB jobs to a Linux compute cluster.

            For more information about MATLAB:

          • 7.1.9.13.5  MATLAB (Cluster Profile Manager)

            7.1.9.13.5  MATLAB (Cluster Profile Manager)

            MATLAB offers two kinds of profiles for parallel execution: the 'local' profile and user-defined cluster profiles. The 'local' profile runs a MATLAB job on the processor core(s) of the same compute node that is running the client. To run a MATLAB job on compute node(s) different from the node running the client, you must define a Cluster Profile using the Cluster Profile Manager.

            To prepare a user-defined cluster profile, use the Cluster Profile Manager in the Parallel menu. This profile contains the PBS details (queue, nodes, ppn, walltime, etc.) of your job submission. Ultimately, your cluster profile will be an argument to MATLAB functions like batch(). To learn more about MATLAB's Parallel Computing Toolbox and setting up Cluster Profiles, please review the "Getting Started" section in the MATLAB documentation. The documentation for the release installed on Carter can be accessed directly from a MATLAB session:

            $ module load matlab
            $ matlab -nodisplay -singleCompThread
            >> doc distcomp
            

            For your convenience, ITaP provides a generic cluster profile that can be downloaded:

            To import the profile, start a MATLAB session and select Manage Cluster Profiles... from the Parallel menu. In the Cluster Profile Manager, select Import, navigate to the folder containing the profile, select mypbsprofile.settings and click OK. Please remember that the profile will need to be customized for your specific needs. If you have any questions, please contact us.

            For detailed information about MATLAB's Parallel Computing Toolbox, examples, demos, and tutorials:

          • 7.1.9.13.6  MATLAB (Interpreting an M-file)

            7.1.9.13.6  MATLAB (Interpreting an M-file)

            The MATLAB interpreter is the part of MATLAB which reads M-files and MEX-files and executes MATLAB statements.

            This section illustrates how to submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and gets the three random numbers. The system function hostname returns two values: a code and the run host name.

            Prepare a MATLAB script M-file myscript.m, and a MATLAB function M-file myfunction.m:

            % FILENAME:  myscript.m
            
            % Display name of compute node which ran this job.
            [c name] = system('hostname');
            fprintf('\n\nhostname:%s\n', name);
            
            % Display three random numbers.
            A = rand(1,3);
            fprintf('%f %f %f\n', A);
            
            quit;
            

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                % Return name of compute node which ran this job.
                [c name] = system('hostname');
                result = sprintf('hostname:%s', name);
            
                % Return three random numbers.
                A = rand(1,3);
                r = sprintf('%f %f %f', A);
                result=strvcat(result,r);
            
            end
            

            Also, prepare a job submission file, here named myjob.sub. Run with the name of the script M-file:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # -nodisplay:        run MATLAB in text mode; X11 server not needed
            # -singleCompThread: turn off implicit parallelism
            # -r:                read MATLAB program; use MATLAB JIT Accelerator
            matlab -nodisplay -singleCompThread -r myscript
            

            Submit the job as a single compute node with one processor core:

            $ qsub -l nodes=1:ppn=1,walltime=00:01:00 myjob.sub
            

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            197986.carter-adm  myusername   standby  myjob.sub    4645   1   1    --  00:01 R 00:00
            

            Output shows one compute node (NDS) with one processor core (TSK).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a001.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            
            hostname:carter-a001.rcac.purdue.edu
            
            0.814724 0.905792 0.126987
            

            Output shows that a processor core on one compute node (a001) processed the entire job. One processor core processed myjob.sub and myscript.m. Output also displays the three random numbers.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about MATLAB:

          • 7.1.9.13.7  MATLAB Compiler (Compiling an M-file)

            7.1.9.13.7  MATLAB Compiler (Compiling an M-file)

            The MATLAB Compiler translates an M-file into a standalone application or software component. The MATLAB Compiler Runtime (MCR) is a standalone set of shared libraries. Together, compiling and the MCR enable the execution of MATLAB files, even outside the MATLAB environment. While you do need to purchase a MATLAB Compiler license to build an executable, you may freely distribute the executable and the MCR without license restrictions.

            This section illustrates how to compile and submit a small, serial, MATLAB program as a batch job to a PBS queue. This MATLAB program prints the name of the run host and computes the inverse of a matrix. The system function hostname returns two values: a code and the run host name.

            This example uses the MATLAB Compiler mcc to compile a MATLAB M-file. During compilation, the default cluster profile may be either the 'local' profile or your PBS cluster profile; the results will be the same. This job is completely off the front end.

            The MATLAB Compiler license is a lingering license. Using the compiler locks its license for at least 30 minutes. For this reason, and to minimize your license usage, it is best to run the Compiler on one cluster.

            Prepare either a MATLAB script M-file or a MATLAB function M-file. The method described below works for both.

            The MATLAB script M-file includes the MATLAB statement quit to ensure that the compiled program terminates. Use an appropriate filename, here named myscript.m:

            % FILENAME:  myscript.m
            
            % Display name of compute node which ran this job.
            [c name] = system('hostname');
            fprintf('\n\nhostname:%s\n', name)
            
            % Display three random numbers.
            A = rand(1,3);
            fprintf('%f %f %f\n', A);
            
            quit;
            

            The MATLAB function M-file has the usual function and end statements. Use an appropriate filename, here named myfunction.m:

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                % Return name of compute node which ran this job.
                [c name] = system('hostname');
                result = sprintf('hostname:%s', name);
            
                % Return three random numbers.
                A = rand(1,3);
                r = sprintf('%f %f %f', A);
                result=strvcat(result,r);
            
            end
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            ./run_myscript.sh /apps/rhel5/MATLAB/R2012a
            

            On a front end, load modules for MATLAB and GCC and verify the versions loaded. The MATLAB Compiler mcc depends on shared libraries from GCC Version 4.3.x. GCC 4.6.2 is available on Carter. Compile the MATLAB script M-file:

            $ module load matlab
            $ module load gcc
            $ mcc -m mywrapper.m myscript.m
            

            A few new files appear after the compilation:

            mccExcludedFiles.log
            myscript
            myscript.prj
            myscript_main.c
            myscript_mcc_component_data.c
            readme.txt
            run_myscript.sh
            

            The name of the stand-alone executable file is myscript. The name of the shell script to run this executable file is run_myscript.sh.

            To obtain the name of the compute node which runs this compiler-generated script run_myscript.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

            #!/bin/sh
            # script for execution of deployed applications
            #
            # Sets up the MCR environment for the current $ARCH and executes
            # the specified command.
            #
            exe_name=$0
            exe_dir=`dirname "$0"`
            
            echo "run_myscript.sh"
            hostname
            
            echo "------------------------------------------"
            if [ "x$1" = "x" ]; then
              echo Usage:
              echo    $0 \ args
            else
              echo Setting up environment variables
              MCRROOT="$1"
              echo ---
              LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
              LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
              LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
                MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
              XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
              export LD_LIBRARY_PATH;
              export XAPPLRESDIR;
              echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
              shift 1
              "${exe_dir}"/myscript $*
            fi
            exit
            

            Submit the job:

            $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
            

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            378428.carter-adm  myusername   standby  myjob.sub   18964   1   1    --  00:01 R 00:00
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a001.rcac.purdue.edu
            run_myscript.sh
            carter-a001.rcac.purdue.edu
            ------------------------------------------
            Setting up environment variables
            ---
            LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2012a/runtime/glnxa64:/apps/rhel5/MATLAB_R2012a/bin/glnxa64:/apps/rhel5/MATLAB_R2012a/sys/os/glnxa
            64:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64
            /server:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2012a/sys/java/jre/glnxa64/jre/lib/amd64
            Warning: No display specified.  You will not be able to display graphics on the screen.
            
            
            hostname:carter-a001.rcac.purdue.edu
            
            0.814724 0.905792 0.126987
            

            Output shows the name of the compute node that ran the job submission file myjob.sub, the name of the compute node that ran the compiler-generated script run_myscript.sh, and the name of the compute node that ran the serial job: a001 in all three cases. Output also shows the three random numbers.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To apply this method of job submission to a MATLAB function M-file, prepare a wrapper function which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

            # FILENAME:  mywrapper.m
            
            result = myfunction();
            disp(result)
            quit;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            ./run_mywrapper.sh /apps/rhel5/MATLAB/R2012a
            

            Compile both the wrapper and the function then submit:

            $ mcc -m mywrapper.m myfunction.m
            $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
            

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

            For more information about the MATLAB Compiler:

          • 7.1.9.13.8  MATLAB Executable (MEX-file)

            7.1.9.13.8  MATLAB Executable (MEX-file)

            MEX stands for MATLAB Executable. A MEX-file offers an interface which allows MATLAB code to call functions written in C, C++, or Fortran as though these external functions were built-in MATLAB functions. MATLAB also offers external interface functions that facilitate the transfer of data between MEX-files and MATLAB. A MEX-file usually starts by transferring data from MATLAB to the MEX-file; then it processes the data with the user-written code; and finally, it transfers the results back to MATLAB. This feature involves compiling then dynamically linking the MEX-file to the MATLAB program. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while. Areas of application include legacy code written in C, C++, or Fortran.

            This section illustrates how to use the PBS qsub command to submit a small MATLAB job with a MEX-file to a PBS queue.

            The first MEX example calls a C function which employs serial code to add two matrices. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license.

            The second MEX example calls a C function which employs CUDA to distribute the work of a shared-memory program (unrolled for loop) among many threads running on stream processors (SP) of a GPU. This example, when executed, uses the MATLAB interpreter, so it requires and checks out a MATLAB license. This example avoids using a PCT license.

            For the first example, prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, int n) {
                int i;
            
                /* Matrix (component-wise) addition. */
                for (i = 0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            

            Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. Use an appropriate filename, here named matrixSum.c:

            /***********************************************************
             * FILENAME:  matrixSum.c
             *
             * Adds two MxN arrays (inMatrix).
             * Outputs one MxN array (outMatrix).
             *
             * The calling syntax is:
             *
             *      matrixSum (inMatrix, inMatrix, outMatrix, size)
             *
             * This is a MEX-file for MATLAB.
             *
             **********************************************************/
            
            #include "mex.h"
            
            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, mwSize n) {
                mwSize i;
            
                /* Component-wise addition. */
                for (i = 0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            
            /* Gateway Function */
            void mexFunction (int nlhs, mxArray *plhs[],
                              int nrhs, const mxArray *prhs[]) {
                double *inMatrix_a;               /* mxn input matrix  */
                double *inMatrix_b;               /* mxn input matrix  */
                mwSize nrows_a,ncols_a;           /* size of matrix a  */
                mwSize nrows_b,ncols_b;           /* size of matrix b  */
                double *outMatrix_c;              /* mxn output matrix */
            
                /* Check for proper number of arguments. */
                if(nrhs!=2) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
                }
                if(nlhs!=1) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
                }
            
                /* Get dimensions of the first input matrix. */
                nrows_a = mxGetM(prhs[0]);
                ncols_a = mxGetN(prhs[0]);
                /* Get dimensions of the second input matrix. */
                nrows_b = mxGetM(prhs[1]);
                ncols_b = mxGetN(prhs[1]);
            
                /* Check for equal number of rows. */
                if(nrows_a != nrows_b) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
                }
                /* Check for equal number of columns. */
                if(ncols_a != ncols_b) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
                }
            
                /* Make a pointer to the real data in the first input matrix. */
                inMatrix_a = mxGetPr(prhs[0]);
                /* Make a pointer to the real data in the second input matrix. */
                inMatrix_b = mxGetPr(prhs[1]);
            
                /* Make the output matrix. */
                plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);
            
                /* Make a pointer to the real data in the output matrix. */
                outMatrix_c = mxGetPr(plhs[0]);
            
                /* Call the computational routine. */
                matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
            }
            

            Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

            % FILENAME:  myscript.m
            
            % Display the name of the compute node which runs this script.
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('myscript.m:  hostname:%s\n', name)
            
            % Call the separately compiled and dynamically linked MEX-file.
            A = [1,1,1;1,1,1]
            B = [2,2,2;2,2,2]
            C = matrixSum(A,B)
            
            quit;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # -nodisplay:        run MATLAB in text mode; X11 server not needed
            # -singleCompThread: turn off implicit parallelism
            # -r:                read MATLAB program; use MATLAB JIT Accelerator
            matlab -nodisplay -singleCompThread -r myscript
            

            To access the MATLAB utility mex, load a MATLAB module. mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Carter, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). Compile matrixSum.c into a MATLAB-callable MEX-file:

            $ module load matlab
            $ module load gcc
            $ mex matrixSum.c
            

            The name of the MATLAB-callable MEX-file is matrixSum.mexa64. If you see the following warning, ignore it:

            Warning: You are using gcc version "4.6.2".  The version
                     currently supported with MEX is "4.3.4".
                     For a list of currently supported compilers see:
                     http://www.mathworks.com/support/compilers/current_release/
            

            Submit the job:

            $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
            

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            115204.carter-adm  myusername   standby  Job1                1   1    --  00:01 Q   --
            

            Job status shows one processor core (TSK) on one compute node (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a148.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            myscript.m:  hostname:carter-a148.rcac.purdue.edu
            
            A =
            
                 1     1     1
                 1     1     1
            
            
            B =
            
                 2     2     2
                 2     2     2
            
            
            C =
            
                 3     3     3
                 3     3     3
            

            Output shows the name of the compute node (a148) which processed this serial job. Also, this job shared the compute node with other jobs.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For the second example, prepare a MEX file with a function containing CUDA directives, variables, and function calls. Use an appropriate filename, here named matrixMul.cu:

            Prepare a MATLAB script M-file with an appropriate filename, here named myscript.m:

            % FILENAME:  myscript.m
            % Multiply Two Square Matrices
            
            
            % Display the name of the compute node which runs this MATLAB script.
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('In myscript.m:    hostname:%s\n', name)
            
            % Setup double-precision matrix operands.
            N = 1024;        % size of matrix operands: N*N
            A = zeros(N,N);  % preallocate before assigning values
            for r=1:N
            for c=1:N
                A(r,c) = r*N+c;
            end
            end
            B = ones(N,N);
            C1 = zeros(N,N);
            C2 = zeros(N,N);
            
            % Time the matrix product on a CPU.
            tic;
            C1 = A*B;
            time_cpu = toc;
            
            % Time the matrix product on a GPGPU.
            % Call the separately compiled and dynamically linked MEX-file.
            %  - display the name of the compute node which runs the MEX function.
            %  - perform the matrix multiplication.
            %  - use the CUDA clock to get time spent in the GPGPU.
            tic;
            C2 = matrixMul(A,B);
            time_gpu = toc;
            
            % Compare the two matrix products.
            if (C1 ~= C2) fprintf('Error: C1 ~= C2\n'); end;
            
            % Display CPU and GPGPU times.
            fprintf('In myscript.m:    elapsed time in CPU:                   %f seconds\n', time_cpu);
            fprintf('In myscript.m:    elapsed time in MEX/CUDA:              %f seconds\n', time_gpu);
            
            quit;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # -nodisplay:        run MATLAB in text mode; X11 server not needed
            # -singleCompThread: turn off implicit parallelism
            # -r:                read MATLAB program; use MATLAB JIT Accelerator
            matlab -nodisplay -singleCompThread -r myscript
            

            To access the Nvidia utility nvcc and the MATLAB utility mex, start an interactive session on a compute node with a GPGPU and load CUDA and MATLAB modules. Navigate to your working directory. The command-line option -arch=sm_20 specifies Compute Capability 2.0 (double precision). mex depends on shared libraries from GCC Version 4.3.x. This version is not available on Carter, but try the default GCC (MATLAB MEX does not support GCC 4.6 and up). Compile matrixMul.cu into a MATLAB-callable MEX-file:

            $ qsub -I -q standby-g -l nodes=1
            $ source /etc/profile.d/modules.csh
            $ module load cuda
            $ module load matlab
            $ cd myworkingdirectory
            $ nvcc -arch=sm_20 -I/apps/rhel5/MATLAB/R2011b/extern/include/ --cuda "matrixMul.cu" --output-file "matrixMul.cpp"
            $ mex -I/opt/cuda/include -L/opt/cuda/lib64 -L/apps/rhel5/MATLAB/R2011b/bin/glnxa64 -lcudart matrixMul.cpp
            $ logout
            

            The name of the MATLAB-callable MEX-file is matrixMul.mexa64. If you see the following warning, ignore it:

            Warning: You are using gcc version "4.4.6".  The version
                     currently supported with MEX is "4.3.4".
                     For a list of currently supported compilers see:
                     http://www.mathworks.com/support/compilers/current_release/

            Submit the job to a queue with compute nodes loaded with GPGPUs:

            $ qsub -q standby-g -l nodes=1,walltime=00:00:30 myjob.sub
            

            View job status:

            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            183010.carter-adm  myusername   standby- myjob.sub    4181   1   1    --  00:00 R   --
            

            Job status shows one processor core (TSK) on one compute node (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-g000.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            In myscript.m:    hostname:carter-g000.rcac.purdue.edu
            In matrixMul.cu:  hostname:carter-g000.rcac.purdue.edu
            In matrixMul.cu:  elapsed time in GPU (shared memory):   54.3 milliseconds
            In myscript.m:    elapsed time in CPU:                   0.511471 seconds
            In myscript.m:    elapsed time in MEX/CUDA:              3.359480 seconds
            

            Output shows that a processor core on one compute node (g000) processed the entire job. One processor core processed myjob.sub, myscript.m, and matrixMul.cu. Output also shows the GPU runtime from the CUDA clock (in milliseconds) and compares the MATLAB tic/toc times (in seconds) for the highly optimized MATLAB expression A*B running on the CPU and the MATLAB MEX function containing the CUDA code. The time spent running this column-major version on the shared memory is comparable to the time of the row-major version on the global memory of the standalone program in the section "Compiling GPGPU/CUDA Programs;" perhaps the column-major order breaks thread coalescing of the row-major order. The MEX feature is expensive compared to A*B. Also, this job shared the compute node with other jobs.

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

            A similar example using single precision rather than double precision requires only a few changes. In the file matrixMul.cu, replace "double" with "float" and "mxDOUBLE_CLASS" with "mxSINGLE_CLASS." In the file myscript.m, use the MATLAB function single() to define single-precision matrices. In the nvcc command line, delete the command-line option -arch=sm_20.

            For more information about the MATLAB MEX-file:

          • 7.1.9.13.9  MATLAB Standalone Program

            7.1.9.13.9  MATLAB Standalone Program

            A stand-alone MATLAB program is a C, C++, or Fortran program which calls user-written M-files and the same libraries which MATLAB uses. A stand-alone program has access to MATLAB objects, such as the array and matrix classes, as well as all the MATLAB algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select MATLAB functions, a stand-alone MATLAB program may be a good option. This offers the possibility for substantially improved performance over MATLAB source code, especially for statements like for and while while still allowing use of specialized MATLAB functions where useful.

            This section illustrates how to submit a small, stand-alone, MATLAB program to a PBS queue. This C example calls a compiled MATLAB script which computes the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

            Prepare a MATLAB function which returns the inverse of a matrix. Use an appropriate filename, here named myinverse.m:

            % FILENAME:  myinverse.m
            
            function Y = myinverse (X)
            
                % Display name of compute node which runs this function.
                [c name] = system('hostname');
                fprintf('\n\nhostname:%s\n', name)
            
                % Invert a matrix.
                Y = inv(X);
            
            end
            

            Prepare a second MATLAB function which displays a matrix. Use an appropriate filename, here named myprintmatrix.m:

            % FILENAME:  myprintmatrix.m
            
            function myprintmatrix(A)
                     disp(A)
            end
            

            Prepare a C source file with a main function and the necessary external function interface and give it an appropriate filename, here named myprogram.c. Note that when you invoke a MATLAB function from C, the MATLAB function name appears "mangled". The C program invokes the MATLAB function myinverse using the name mlfMyinverse and the MATLAB function myprintmatrix using the name mlfMyprintmatrix. You must modify all MATLAB function names in this manner when you call them from outside MATLAB:

            /* FILENAME:  myprogram.c
            
            Inverse of:
            
                  A                B
               -------        ------------
               1  2  1         1 -3/2  1/2
               1  1  1   -->   1  -1   0
               3 -1  1        -2  7/2 -1/2
            
            
            
                1.0000   -1.5000    0.5000
                1.0000   -1.0000         0
               -2.0000    3.5000   -0.5000
            
            */
            
            
            #include <stdio.h>
            #include <math.h>
            #include "libmylib.h"     /* compiler-generated header file */
            
            int main (const int argc, char ** argv) {
            
                mxArray *A;   /* matrix containing                      */
                mxArray *B;   /* matrix containing result               */
            
                int Nrow=3, Ncol=3;
                double a[] = {1,2,1,1,1,1,3,-1,1};  /* row-major order  */
                double b[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
                double *ptr;
            
                printf("Enter myprogram.c\n");
            
                libmylibInitialize();     /* call mylib initialization  */
            
                /* Make an uninitialized Nrow x Ncol MATLAB matrix.    */
                A = mxCreateDoubleMatrix(Nrow, Ncol, mxREAL);
            
                /* Initialize the MATLAB matrix.                        */
                ptr = (double *)mxGetPr(A);
                memcpy(ptr,b,Nrow*Ncol*sizeof(double));
            
                /* Call mlfMyinverse, the compiled version of myinverse.m. */
                mlfMyinverse(1,&B,A);
            
                /* Print the results. */
                mlfMyprintmatrix(B);
            
                /* Free the matrices allocated during this computation. */
                mxDestroyArray(A);
                mxDestroyArray(B);
            
                libmylibTerminate();     /* call mylib initialization   */
            
                printf("Exit myprogram.c\n");
                return 0;
            }
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            ./myprogram
            

            To access the MATLAB Compiler mcc and mbuild, load a MATLAB module. The MATLAB Compiler, mcc, depends on shared libraries from GCC Version 4.3.x. This version is not available on Carter, but GCC Version 4.6.2 is compatible. Compile the user-written, MATLAB functions into a dynamically loaded, shared library. Compile the C program:

            $ module load matlab
            $ module load gcc
            $ mcc -W lib:libmylib -T link:lib myinverse.m myprintmatrix.m
            $ mbuild myprogram.c -L. -lmylib -I.
            

            Several new files appear after the compilation:

            libmylib.c
            libmylib.exports
            libmylib.h
            libmylib.so
            mccExcludedFiles.log
            myinverse
            myprintmatrix
            myprogram
            readme.txt
            

            The name of the compiled, stand-alone MATLAB program is myprogram. The name of the dynamically linked library of user-written MATLAB functions is mylib.

            Submit the job:

            $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a145.rcac.purdue.edu
            Enter myprogram.c
            Warning: No display specified.  You will not be able to display graphics on the screen.
            Warning: Unable to load Java Runtime Environment: libjvm.so: cannot open shared object file: No such file or directory
            Warning: Disabling Java support
            Hello, Thomas
            
            
            hostname:carter-a145.rcac.purdue.edu
            
                1.0000   -1.5000    0.5000
                1.0000   -1.0000         0
               -2.0000    3.5000   -0.5000
            
            Exit myprogram.c
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about the MATLAB stand-alone program:

          • 7.1.9.13.10  MATLAB Engine Program

            7.1.9.13.10  MATLAB Engine Program

            The MATLAB Engine allows using MATLAB as a computation engine. A MATLAB Engine program is a standalone C, C++, or Fortran program which calls functions of the Engine Library allowing you to start and end a MATLAB process, send data to and from MATLAB, and send commands to be processed in MATLAB.

            This section illustrates how to submit a small, stand-alone, MATLAB Engine program to a PBS queue. This C program calls functions of the Engine Library to compute the inverse of a matrix. This example, when executed, does not use the MATLAB interpreter, so it neither requires nor checks out a MATLAB license.

            Prepare a C program which computes the inverse of a matrix. Use an appropriate filename, here named myprogram.c:

            /* FILENAME:  myprogram.c
            
            A simple program to illustrate how to call MATLAB Engine functions
            from a C program.
            
            Inverse of:
            
                  A                B
               -------        ------------
               1  2  1         1 -3/2  1/2
               1  1  1   -->   1  -1   0
               3 -1  1        -2  7/2 -1/2
            
            */
            
            
            #include <stdlib.h>
            #include <stdio.h>
            #include <string.h>
            #include "engine.h"
            #define  BUFSIZE 256
            
            
            int main ()
            {
                Engine *ep;
                mxArray *A = NULL;
                mxArray *B = NULL;
                int Ncol=3, Nrow=3, col, row, ndx;
                double a[] = {1,1,3,2,1,-1,1,1,1};  /* col-major order  */
                double b[9] = {9,9,9,9,9,9,9,9,9};
                char buffer[BUFSIZE+1];
            
                printf("Enter myprogram.c\n");
            
                /* Call engOpen with a NULL string. This starts a MATLAB process */
                /* on the current host using the command "matlab".               */
                if (!(ep = engOpen(""))) {
                    fprintf(stderr, "\nCan't start MATLAB engine\n");
                    return EXIT_FAILURE;
                }
            
                buffer[BUFSIZE] = '\0';
                engOutputBuffer(ep, buffer, BUFSIZE);
            
                /* Make a variable for the data. */
                A = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
                B = mxCreateDoubleMatrix(Ncol, Nrow, mxREAL);
                memcpy((void *)mxGetPr(A), (void *)a, sizeof(a));
            
                /* Place the variable A into the MATLAB workspace. */
                /* Place the variable B into the MATLAB workspace. */
                engPutVariable(ep, "A", A);
                engPutVariable(ep, "B", B);
            
                /* Evaluate and display the inverse. */
                engEvalString(ep, "B = inv(A)");
                printf("%s", buffer);
            
                /* Get variable B from the MATLAB workspace.       */
                /* Copy inverted matrix to a C array named "b".    */
                B = engGetVariable(ep, "B");
                memcpy((void *)b, (void *)mxGetPr(B), sizeof(b));
                ndx = 0;
                for (col=0;col<Ncol;++col) {
                    for (row=0;row<Nrow;++row) {
                        printf("  %5.1f", b[row*Nrow+col]);
                        ++ndx;
                    }
                    printf("\n");
                }
            
                /* Free memory.                       */
                mxDestroyArray(A);
                mxDestroyArray(B);
            
                /* Close MATLAB engine.               */
                engClose(ep);
            
                /* Exit C program.                    */
                printf("Exit myprogram.c\n");
                return EXIT_SUCCESS;
            }
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            ./myprogram
            

            Copy MATLAB file engopts.sh to the directory from which you intend to submit Engine jobs. Compile myprogram.c:

            $ cp /apps/rhel5/MATLAB/R2011b/bin/engopts.sh .
            $ mex -f engopts.sh myprogram.c
            

            Submit the job:

            $ qsub -l nodes=1,walltime=00:01:00 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a210.rcac.purdue.edu
            Enter myprogram.c
            >>
            B =
            
                1.0000   -1.5000    0.5000
                1.0000   -1.0000         0
               -2.0000    3.5000   -0.5000
            
                1.0   -1.5    0.5
                1.0   -1.0    0.0
               -2.0    3.5   -0.5
            Exit myprogram.c
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about the MATLAB stand-alone program:

          • 7.1.9.13.11  MATLAB Implicit Parallelism

            7.1.9.13.11  MATLAB Implicit Parallelism

            MATLAB implements implicit parallelism which, in general, is the exploitation of parallelism that is inherent in many computations, such as matrix multiplication, linear algebra, and performing the same operation on a set of numbers. Implicit parallelism is a form of multithreading which uses hardware to execute efficiently multiple threads. This is different from the explicit parallelism of the Parallel Computing Toolbox. Multithreading aims to increase utilization of a single processor core by using thread-level as well as instruction-level parallelism.

            MATLAB offers implicit parallelism in the form of thread-parallel enabled functions. These functions run on the multicore processors of typical Linux clusters. Since these processor cores, or threads, share a common memory, many MATLAB functions contain multithreading potential. Vector operations, the particular application or algorithm, and the amount of computation (array size) contribute to the determination of whether a function runs serially or with multithreading.

            When your job triggers implicit parallelism, it attempts to allocate its threads on all processor cores of the compute node on which the MATLAB client is running, including processor cores running other jobs. This competition can degrade the performance of all jobs running on the node. If an affected processor core participates in a larger, distributed-memory, parallel job involving many other nodes, then performance degradation can become much more widespread.

            When you know that you are coding a serial job but are unsure whether you are using thread-parallel enabled operations, run MATLAB with implicit parallelism turned off. Beginning with the R2009b, you can turn multithreading off by starting MATLAB with -singleCompThread:

            $ matlab -nodisplay -singleCompThread -r mymatlabprogram
            

            When you are using implicit parallelism, request exclusive access to a compute node by requesting all cores which are physically available on a node of a compute cluster:

            $ qsub -l nodes=1:ppn=16,walltime=00:01:00 myjob.sub
            

            Parallel Computing Toolbox commands, such as spmd, preempt multithreading. Note that opening a MATLAB pool neither prevents multithreading nor changes the thread count in effect.

            For more information about MATLAB's implicit parallelism:

          • 7.1.9.13.12  MATLAB Parallel Computing Toolbox (parfor)

            7.1.9.13.12  MATLAB Parallel Computing Toolbox (parfor)

            The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. It offers a shared-memory computing environment with a maximum of 12 workers (labs, threads; starting in version R2011a) running on the local cluster profile in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the fine-grained parallelism of a parallel for loop (parfor) in a pool job. Areas of application include for loops with independent iterations.

            The following examples illustrate three methods about submitting a small, parallel, MATLAB program with a parallel loop (parfor statement) as a batch, MATLAB pool job to a PBS queue. This MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each iteration of the parfor loop. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the iterations of the parallel loop.

            The first method uses the PBS qsub command to submit to a compute node a MATLAB client which calls the MATLAB batch() function with a user-defined cluster profile.

            The second method uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS configuration which scatters the MATLAB workers onto different compute nodes.

            The third method uses the MATLAB compiler mcc and a user-defined Torque cluster profile to compile a MATLAB M-file and submits the compiled file to a PBS queue.

            Prepare a MATLAB pool program in the form of a MATLAB script M-file and a MATLAB function M-file with appropriate filenames, here named myscript.m and myfunction.m:

            % FILENAME:  myscript.m
            
            % SERIAL REGION
            [c name] = system('hostname');
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            numlabs = matlabpool('size');
            fprintf('                hostname                         numlabs  labindex  iteration\n')
            fprintf('                -------------------------------  -------  --------  ---------\n')
            tic;
            
            % PARALLEL LOOP
            parfor i = 1:8
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
                pause(2);
            end
            
            % SERIAL REGION
            elapsed_time = toc;        % get elapsed time in parallel loop
            fprintf('\n')
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
            

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                % SERIAL REGION
                % Variable "result" is a "reduction" variable.
                [c name] = system('hostname');
                result = sprintf('SERIAL REGION:  hostname:%s', name);
                numlabs = matlabpool('size');
                r = sprintf('                hostname                         numlabs  labindex  iteration');
                result = strvcat(result,r);
                r = sprintf('                -------------------------------  -------  --------  ---------');
                result = strvcat(result,r);
                tic;
            
                % PARALLEL LOOP
                parfor i = 1:8
                    [c name] = system('hostname');
                    name = name(1:length(name)-1);
                    r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
                    result = strvcat(result,r);
                    pause(2);
                end
            
                % SERIAL REGION
                elapsed_time = toc;          % get elapsed time in parallel loop
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                r = sprintf('\nSERIAL REGION:  hostname:%s', name);
                result = strvcat(result,r);
                r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
                result = strvcat(result,r);
            
            end
            

            Both M-files display the names of all compute nodes which run the job. The parfor statement does not set the values of variables numlabs or labindex, but function matlabpool() can return the pool size. The M-file script uses fprintf() to display the results. The M-file function returns a single value which contains a concatenation of the results.

            The execution of a pool job starts with a worker (batch session) executing the statements of the first serial region up to the parfor block, when it pauses. A set of workers (the pool) executes the parfor block. When they finish, the batch session resumes by executing the second serial region. The code displays the names of the compute nodes running the batch session and the worker pool.

            The first method of job submission uses the PBS qsub command to submit a job to a PBS queue, and the function batch(). The batch session distributes the independent iterations of the loop to the workers of the pool. The workers of the pool process simultaneously their respective portions of the workload of the parallel loop so that the parallel loop might run faster than the equivalent serial version. A pool size of N requires N+1 workers (processor cores). The source code is a MATLAB M-file (MATLAB function batch() accepts either a script M-file or a function M-file).

            This method uses the batch() function and either the M-file script myscript.m or the M-file function myfunction.m. In this case the MATLAB client runs on a compute node and uses a user-defined cluster profile.

            Prepare a MATLAB script M-file that calls MATLAB function batch() which makes a four-lab pool on which to run the MATLAB code in the file myscript.m, which specifies the 'mypbsprofile' cluster profile, and which captures job output in the diary. Use an appropriate filename, here named mylclbatch.m:

            % FILENAME:  mylclbatch.m
            
            !echo "mylclbatch.m"
            !hostname
            
            pjob=batch('myscript','Matlabpool',4,'Profile','mypbsprofile','CaptureDiary',true);
            pjob.wait;
            pjob.diary
            quit;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            matlab -nodisplay -r mylclbatch
            

            Submit the job as a single compute node with one processor core and request one PCT license:

            $ qsub -l nodes=1:ppn=1,walltime=01:00:00,gres=Parallel_Computing_Toolbox+1 myjob.sub
            

            One processor core runs myjob.sub and mylclbatch.m.

            This job submission causes a second job submission.

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            199025.carter-adm  myusername   standby  myjob.sub   30197   1   1    --  00:01 R 00:00
            
            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            199025.carter-adm  myusername   standby  myjob.sub   30197   1   1    --  00:01 R 00:00
            199026.carter-adm  myusername   standby  Job1          668   4   4    --  00:01 R 00:00
            

            At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a000.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            
            mylclbatch.m
            carter-a000.rcac.purdue.edu
            SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu
            
                            hostname                         numlabs  labindex  iteration
                            -------------------------------  -------  --------  ---------
            PARALLEL LOOP:  carter-a001.rcac.purdue.edu            4         1          2
            PARALLEL LOOP:  carter-a002.rcac.purdue.edu            4         1          4
            PARALLEL LOOP:  carter-a001.rcac.purdue.edu            4         1          5
            PARALLEL LOOP:  carter-a002.rcac.purdue.edu            4         1          6
            PARALLEL LOOP:  carter-a003.rcac.purdue.edu            4         1          1
            PARALLEL LOOP:  carter-a003.rcac.purdue.edu            4         1          3
            PARALLEL LOOP:  carter-a004.rcac.purdue.edu            4         1          7
            PARALLEL LOOP:  carter-a004.rcac.purdue.edu            4         1          8
            
            SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu
            Elapsed time in parallel loop:   5.411486
            

            Output shows that the property Matlabpool (4) defined the number of labs in the pool which processed the parallel for loop. While output does not explicitly show the fifth lab, that lab runs the batch session, which includes the two serial portions of the MATLAB pool program. Because the MATLAB pool requires the worker running the batch session in addition to N labs in the pool, there must be at least N+1 processor cores available on the cluster.

            Output shows that processor cores on one compute node (a000) processed the entire job. One processor core processed myjob.sub and mylclbatch.m. One processor core processed the batch session myscript.m, which includes the two serial regions, while four processor cores processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. Output shows the iterations of the parfor loop in scrambled order since the labs process each iteration independently of the other iterations. Finally, the output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To apply the second method of job submission to a function M-file, modify mylclbatch.m with one of the following sequences:

            pjob=batch('myfunction','Matlabpool',4,'Profile','mypbsprofile','CaptureDiary',true);
            pjob.wait;
            pjob.diary
            

            >> pjob=batch('myfunction',1,{},'Matlabpool',4,'Profile','mypbsprofile');
            pjob.wait;
            result = getAllOutputArguments(pjob);
            result{1}
            

            pjob=batch(@myfunction,1,{},'Matlabpool',4,'Profile','mypbsprofile');
            pjob.wait;
            result = getAllOutputArguments(pjob);
            result{1}
            

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Also, consider increasing the size of the MATLAB pool, the value of the property Matlabpool which appears as an argument in the call of function batch().

            Specifying a MATLAB pool with 12 labs means a total of 13 workers. This exceeds the 'local' configuration of MATLAB R2011b. The relevant lines of code and the error follow:

            pjob=batch('myscript','Matlabpool',12,'Profile','local','CaptureDiary',true);
            
            $ qsub -l nodes=1:ppn=14,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1 myjob.sub
            
            {Error using batch (line 172)
            You requested a minimum of 13 workers but only 12 workers are allowed with the
            local scheduler.
            
            Error in mylclbatch (line 6)
            pjob=batch('myscript','Matlabpool',12,'Profile','local','CaptureDiary',true);}
            

            The second method uses either a MATLAB script M-file or a MATLAB function M-file, and uses a user-defined cluster profile.

            Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements:

            % FILENAME:  myscript.m
            
            % SERIAL REGION
            [c name] = system('hostname');
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            matlabpool open 4;
            numlabs = matlabpool('size');
            fprintf('                hostname                         numlabs  labindex  iteration\n')
            fprintf('                -------------------------------  -------  --------  ---------\n')
            tic;
            
            % PARALLEL LOOP
            parfor i = 1:8
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
                pause(2);
            end
            
            % SERIAL REGION
            elapsed_time = toc;          % get elapsed time in parallel loop
            matlabpool close;
            fprintf('\n')
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
            quit;
            

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                % SERIAL REGION
                % Variable "result" is a "reduction" variable.
                [c name] = system('hostname');
                result = sprintf('SERIAL REGION:  hostname:%s', name);
                matlabpool open 4;
                numlabs = matlabpool('size');
                r = sprintf('                hostname                         numlabs  labindex  iteration');
                result = strvcat(result,r);
                r = sprintf('                -------------------------------  -------  --------  ---------');
                result = strvcat(result,r);
                tic;
            
                % PARALLEL LOOP
                parfor i = 1:8
                    [c name] = system('hostname');
                    name = name(1:length(name)-1);
                    r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
                    result = strvcat(result,r);
                    pause(2);
                end
            
                % SERIAL REGION
                elapsed_time = toc;          % get elapsed time in parallel loop
                matlabpool close;
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                r = sprintf('\nSERIAL REGION:  hostname:%s', name);
                result = strvcat(result,r);
                r = sprintf('elapsed time:   %f', elapsed_time);
                result = strvcat(result,r);
            
            end
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            matlab -nodisplay -r myscript
            

            Run MATLAB to set the default parallel configuration to your PBS configuration:

            $ matlab -nodisplay
            >> parallel.defaultClusterProfile('mypbsprofile');
            >> quit;
            $
            

            Submit the job as a single compute node with one processor core and request one PCT license:

            $ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1 myjob.sub
            

            This job submission causes a second job submission.

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            332026.carter-adm  myusername   standby  myjob.sub   31850   1   1    --  00:01 R 00:00
            
            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            332026.carter-adm  myusername   standby  myjob.sub   31850   1   1    --  00:01 R 00:00
            332028.carter-adm  myusername   standby  Job1          668   4   4    --  00:01 R 00:00
            

            At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a000.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu
            
            Starting matlabpool using the 'mypbsprofile' configuration ... connected to 4 labs.
                            hostname                         numlabs  labindex  iteration
                            -------------------------------  -------  --------  ---------
            PARALLEL LOOP:  carter-a007.rcac.purdue.edu            4         1          2
            PARALLEL LOOP:  carter-a007.rcac.purdue.edu            4         1          4
            PARALLEL LOOP:  carter-a008.rcac.purdue.edu            4         1          5
            PARALLEL LOOP:  carter-a008.rcac.purdue.edu            4         1          6
            PARALLEL LOOP:  carter-a009.rcac.purdue.edu            4         1          3
            PARALLEL LOOP:  carter-a009.rcac.purdue.edu            4         1          1
            PARALLEL LOOP:  carter-a010.rcac.purdue.edu            4         1          7
            PARALLEL LOOP:  carter-a010.rcac.purdue.edu            4         1          8
            
            Sending a stop signal to all the labs ... stopped.
            
            
            SERIAL REGION:  hostname:carter-a000.rcac.purdue.edu
            Elapsed time in parallel region:   3.382151
            

            Output shows the name of the compute node (a000) that processed the job submission file myjob.sub and the two serial regions. The job submission "scattered" among four different compute nodes (a007,a008,a009,a010) the four compute nodes (four MATLAB labs) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop; labs process each iteration independently of the other iterations, so output from the iterations is in random order. Finally, output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased.

            The third method of job submission uses the MATLAB Compiler mcc to compile a MATLAB function M-file with a PBS configuration and submits the compiled file to a PBS queue. This method uses a MATLAB function M-file and a user-defined cluster profile.

            Modify the MATLAB script M-file myscript.m with matlabpool and quit statements or the MATLAB function M-file myfunction.m with matlabpool statements. Proceed with the MATLAB function M-file myfunction.m (when compiling a parfor statement, the parfor must be in a function, not in a script; this is a bug in MATLAB):

            % FILENAME:  myscript.m
            
            warning off all;
            
            % SERIAL REGION
            [c name] = system('hostname');
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            matlabpool open 4;
            numlabs = matlabpool('size');
            fprintf('                hostname                         numlabs  labindex  iteration\n')
            fprintf('                -------------------------------  -------  --------  ---------\n')
            tic;
            
            % PARALLEL LOOP
            parfor i = 1:8
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                fprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d\n', name,numlabs,labindex,i)
                pause(2);
            end
            
            % SERIAL REGION
            elapsed_time = toc;          % get elapsed time in parallel loop
            matlabpool close;
            fprintf('\n')
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            fprintf('Elapsed time in parallel loop:   %f\n', elapsed_time)
            quit;
            

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                warning off all;
            
                % SERIAL REGION
                % Variable "result" is a "reduction" variable.
                [c name] = system('hostname');
                result = sprintf('SERIAL REGION:  hostname:%s', name);
                matlabpool open 4;
                numlabs = matlabpool('size');
                r = sprintf('                hostname                         numlabs  labindex  iteration');
                result = strvcat(result,r);
                r = sprintf('                -------------------------------  -------  --------  ---------');
                result = strvcat(result,r);
                tic;
            
                % PARALLEL LOOP
                parfor i = 1:8
                    [c name] = system('hostname');
                    name = name(1:length(name)-1);
                    r = sprintf('PARALLEL LOOP:  %-31s  %7d  %8d  %9d', name,numlabs,labindex,i);
                    result = strvcat(result,r);
                    pause(2);
                end
            
                % SERIAL REGION
                elapsed_time = toc;          % get elapsed time in parallel loop
                matlabpool close;
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                r = sprintf('\nSERIAL REGION:  hostname:%s', name);
                result = strvcat(result,r);
                r = sprintf('Elapsed time in parallel loop:   %f', elapsed_time);
                result = strvcat(result,r);
            
            end
            

            Prepare a wrapper script which receives and displays the result of myfunction.m. Use an appropriate filename, here named mywrapper.m:

            % FILENAME:  mywrapper.m
            
            result = myfunction();
            disp(result)
            quit;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            ./run_mywrapper.sh /apps/rhel5/MATLAB/R2012a
            

            On a front end, load modules for MATLAB and GCC. The MATLAB Compiler mcc depends on shared libraries from GCC which is available on Carter. Set the default cluster profile to the user-defined cluster configuration and quit MATLAB. Compile both the MATLAB script M-file mywrapper.m and the MATLAB function M-file myfunction.m:

            $ module load matlab
            $ module load gcc
            $ matlab -nodisplay
            >> defaultParallelConfig('mypbsprofile');
            >> quit
            $ mcc -m mywrapper.m myfunction.m
            $ mkdir test
            $ cp mywrapper test
            $ cp run_mywrapper.sh test
            $ cp myjob.sub test
            $ cd test
            

            To obtain the name of the compute node which runs this compiler-generated script run_mywrapper.sh, insert before the echo statement the Linux commands echo and hostname so that the script appears as follows:

            #!/bin/sh
            # script for execution of deployed applications
            #
            # Sets up the MCR environment for the current $ARCH and executes
            # the specified command.
            #
            exe_name=$0
            exe_dir=`dirname "$0"`
            
            echo "run_mywrapper.sh"
            hostname
            
            echo "------------------------------------------"
            if [ "x$1" = "x" ]; then
              echo Usage:
              echo    $0 \ args
            else
              echo Setting up environment variables
              MCRROOT="$1"
              echo ---
              LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
              LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
              LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
                MCRJRE=${MCRROOT}/sys/java/jre/glnxa64/jre/lib/amd64 ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/native_threads ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/server ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE}/client ;
                LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRJRE} ;
              XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
              export LD_LIBRARY_PATH;
              export XAPPLRESDIR;
              echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
              shift 1
              "${exe_dir}"/myfunction $*
            fi
            exit
            

            Submit the job as a single compute node with one processor core and request four DCS licenses:

            $ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=MATLAB_Distrib_Comp_Server+4 myjob.sub
            

            This job runs on a compute node myjob.sub which in turn submits the parallel job. The first job must run at least as long as the job with the parallel loop since it collects the results of the parallel job.

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            115292.carter-adm  myusername   standby  myjob.sub   28611   1   1    --  00:05 R 00:00
            
            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            115292.carter-adm  myusername   standby  myjob.sub   28611   1   1    --  00:05 R 00:00
            115293.carter-adm  myusername   standby  Job1        29390   4   4    --  00:01 R 00:00
            

            At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows that this job submits a second with four processor cores (TSK) on four compute nodes (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a021.rcac.purdue.edu
            run_myfunction.sh
            carter-a021.rcac.purdue.edu
            ------------------------------------------
            Setting up environment variables
            ---
            LD_LIBRARY_PATH is .:/apps/rhel5/MATLAB_R2010a/runtime/glnxa64:/apps/rhel5/MATLAB_R2010a/bin/glnxa64:/apps/rhel5/MATLAB_R2010a/sys/os/glnxa6
            4:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/s
            erver:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64/client:/apps/rhel5/MATLAB_R2010a/sys/java/jre/glnxa64/jre/lib/amd64
            Warning: No display specified.  You will not be able to display graphics on the screen.
            
            SERIAL REGION:  hostname:carter-a021.rcac.purdue.edu
            
            Starting matlabpool using the 'mypbsprofile' configuration ... connected to 4 labs.
                            hostname                         numlabs  labindex  iteration
                            -------------------------------  -------  --------  ---------
            PARALLEL LOOP:  carter-a021.rcac.purdue.edu            4         1          2
            PARALLEL LOOP:  carter-a022.rcac.purdue.edu            4         1          4
            PARALLEL LOOP:  carter-a023.rcac.purdue.edu            4         1          5
            PARALLEL LOOP:  carter-a024.rcac.purdue.edu            4         1          6
            PARALLEL LOOP:  carter-a021.rcac.purdue.edu            4         1          1
            PARALLEL LOOP:  carter-a022.rcac.purdue.edu            4         1          3
            PARALLEL LOOP:  carter-a023.rcac.purdue.edu            4         1          8
            PARALLEL LOOP:  carter-a024.rcac.purdue.edu            4         1          7
            Sending a stop signal to all the labs ... stopped.
            Did not find any pre-existing parallel jobs created by matlabpool.
            
            SERIAL REGION:  hostname:carter-a021.rcac.purdue.edu
            Elapsed time in parallel loop:   5.125206
            

            Output shows the name of the compute node (a021) that ran the job submission file myjob.sub and the compiler-generated script run_mywrapper.sh, the name of the compute node (a021) that ran the two serial regions, and the names of the four compute nodes (a021,a022,a023,a024) that processed the iterations of the parallel loop. The parfor loop does not set variable numlabs to the number of labs in the pool; nor does it give to each lab in the pool a unique value for variable labindex. The scrambled order of the iterations displayed in the output comes from the parallel nature of the parfor loop. Finally, the output shows the time that the four labs spent running the eight iterations of the parfor loop, which decreases as the number of processor cores increases.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsprofile by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments. Also, consider increasing the size of the MATLAB pool, the value which appears in the statement matlabpool open. The maximum possible size of the pool is the number of DCS licenses purchased. Increase the value of MATLAB_Distrib_Comp_Server in the qsub command to match the new size of the pool.

            For more information about MATLAB Parallel Computing Toolbox:

          • 7.1.9.13.13  MATLAB Parallel Computing Toolbox (spmd)

            7.1.9.13.13  MATLAB Parallel Computing Toolbox (spmd)

            The MATLAB Parallel Computing Toolbox (PCT) extends the MATLAB language with high-level, parallel-processing features such as parallel for loops, parallel regions, message passing, distributed arrays, and parallel numerical methods. PCT enables task and data parallelism on a multicore processor. It offers a shared-memory computing environment with a maximum of eight MATLAB workers (labs, threads; versions R2009a) and 12 workers (labs, threads; version R2011a) running on the local configuration in addition to your MATLAB client. Moreover, the MATLAB Distributed Computing Server (DCS) scales PCT applications up to the limit of your DCS licenses. This section illustrates the coarse-grained parallelism of a parallel region (spmd) in a pool job. Areas of application include SPMD (single program, multiple data) problems.

            This section illustrates how to submit a small, parallel, MATLAB program with a parallel region (spmd statement) as a batch, MATLAB pool job to a PBS queue. The MATLAB program prints the name of the run host and shows the values of variables numlabs and labindex for each parallel region of the pool. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the parallel regions.

            This example uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the spmd statement. This job is completely off the front end.

            Prepare a MATLAB script M-file called myscript.m and a MATLAB function M-file myfunction.m with matlabpool statements:

            % FILENAME:  myscript.m
            
            % SERIAL REGION
            [c name] = system('hostname');
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            matlabpool open 4;
            fprintf('                    hostname                         numlabs  labindex\n')
            fprintf('                    -------------------------------  -------  --------\n')
            tic;
            
            % PARALLEL REGION
            spmd
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                fprintf('PARALLEL REGION:  %-31s  %7d  %8d\n', name,numlabs,labindex)
                pause(2);
            end
            
            % SERIAL REGION
            elapsed_time = toc;          % get elapsed time in parallel region
            matlabpool close;
            fprintf('\n')
            [c name] = system('hostname');
            name = name(1:length(name)-1);
            fprintf('SERIAL REGION:  hostname:%s\n', name)
            fprintf('Elapsed time in parallel region:   %f\n', elapsed_time)
            quit;
            

            % FILENAME:  myfunction.m
            
            function result = myfunction ()
            
                % SERIAL REGION
                % Variable "r" is a "composite object."
                [c name] = system('hostname');
                result = sprintf('SERIAL REGION:  hostname:%s', name);
                matlabpool open 4;
                r = sprintf('                  hostname                         numlabs  labindex');
                result = strvcat(result,r);
                r = sprintf('                  -------------------------------  -------  --------');
                result = strvcat(result,r);
                tic;
            
                % PARALLEL REGION
                spmd
                    [c name] = system('hostname');
                    name = name(1:length(name)-1);
                    r = sprintf('PARALLEL REGION:  %-31s  %7d  %8d', name,numlabs,labindex);
                    pause(2);
                end
            
                % SERIAL REGION
                elapsed_time = toc;          % get elapsed time in parallel region
                for ndx=1:length(r)          % concatenate composite object "r"
                    result = strvcat(result,r{ndx});
                end
                matlabpool close;
                [c name] = system('hostname');
                name = name(1:length(name)-1);
                r = sprintf('\nSERIAL REGION:  hostname:%s', name);
                result = strvcat(result,r);
                r = sprintf('elapsed time:   %f', elapsed_time);
                result = strvcat(result,r);
            
            end
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub. Run with the name of either the script M-file or the function M-file:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            matlab -nodisplay -r myscript
            

            Run MATLAB to set the default parallel configuration to your PBS configuration:

            $ matlab -nodisplay
            >> parallel.defaultClusterProfile('mypbsprofile');
            >> quit;
            $
            

            Submit the job as a single compute node with one processor core and request one PCT license and four DCS licenses:

            $ qsub -l nodes=1:ppn=1,walltime=00:01:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub
            

            This job submission causes a second job submission.

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            332026.carter-adm  myusername   standby  myjob.sub   31850   1   1    --  00:01 R 00:00
            
            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            332026.carter-adm  myusername   standby  myjob.sub   31850   1   1    --  00:01 R 00:00
            332028.carter-adm  myusername   standby  Job1          668   4   4    --  00:01 R 00:00
            

            At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a001.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            SERIAL REGION:  hostname:carter-a001.rcac.purdue.edu
            
            Starting matlabpool using the 'mypbsprofile' profile ... connected to 4 labs.
                                hostname                         numlabs  labindex
                                -------------------------------  -------  --------
            Lab 2:
              PARALLEL REGION:  carter-a002.rcac.purdue.edu            4         2
            Lab 1:
              PARALLEL REGION:  carter-a001.rcac.purdue.edu            4         1
            Lab 3:
              PARALLEL REGION:  carter-a003.rcac.purdue.edu            4         3
            Lab 4:
              PARALLEL REGION:  carter-a004.rcac.purdue.edu            4         4
            
            Sending a stop signal to all the labs ... stopped.
            
            
            SERIAL REGION:  hostname:carter-a001.rcac.purdue.edu
            Elapsed time in parallel region:   3.382151
            

            Output shows the name of one compute node (a001) that processed the job submission file myjob.sub and the two serial regions. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a001,a002,a003,a004) that processed the four parallel regions. The total elapsed time demonstrates that the jobs ran in parallel.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about MATLAB Parallel Computing Toolbox:

          • 7.1.9.13.14  MATLAB Distributed Computing Server (parallel job)

            7.1.9.13.14  MATLAB Distributed Computing Server (parallel job)

            The MATLAB Parallel Computing Toolbox (PCT) offers a parallel job via the MATLAB Distributed Computing Server (DCS). The tasks of a parallel job are identical, run simultaneously on several MATLAB workers (labs), and communicate with each other. This section illustrates an MPI-like program. Areas of application include distributed arrays and message passing.

            This section illustrates how to submit a small, MATLAB parallel job with four workers running one MPI-like task to a PBS queue. The MATLAB program broadcasts an integer, which might be the number of slices of a numerical integration, to four workers and gathers the names of the compute nodes running the workers and the lab IDs of the workers. The system function hostname returns two values: a numerical code and the name of the compute nodes that run the program.

            This example uses the PBS qsub command to submit to compute nodes a MATLAB client which interprets an M-file with a user-defined PBS cluster profile which scatters the MATLAB workers onto different compute nodes. This method uses the MATLAB interpreter, the Parallel Computing Toolbox, and the Distributed Computing Server; so, it requires and checks out six licenses: one MATLAB license for the client running on the compute node, one PCT license, and four DCS licenses. Four DCS licenses run the four copies of the parallel job. This job is completely off the front end.

            Prepare a MATLAB script M-file myscript.m and a MATLAB function M-file myfunction.m.

            % FILENAME:  myscript.m
            
            % Specify pool size.
            % Convert the parallel job to a pool job.
            matlabpool open 4;
            spmd
            
            
            if labindex == 1
                % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
                N = labBroadcast(1,int64(1000));
            else
                % Each lab (rank) receives the broadcast value from lab (rank) #1.
                N = labBroadcast(1);
            end
            
            % Form a string with host name, total number of labs, lab ID, and broadcast value.
            [c name] =system('hostname');
            name = name(1:length(name)-1);
            fmt = num2str(floor(log10(numlabs))+1);
            str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);
            
            % Apply global concatenate to all str's.
            % Store the concatenation of str's in the first dimension (row) and on lab #1.
            result = gcat(str,1,1);
            if labindex == 1
                disp(result)
            end
            
            
            end   % spmd
            matlabpool close force;
            quit;
            

            % FILENAME:  myfunction.m
            
            
            function result = myfunction ()
            
                result = 0;
            
                % Specify pool size.
                % Convert the parallel job to a pool job.
                matlabpool open 4;
                spmd
            
                if labindex == 1
                    % Lab (rank) #1 broadcasts an integer value to other labs (ranks).
                    N = labBroadcast(1,int64(1000));
                else
                    % Each lab (rank) receives the broadcast value from lab (rank) #1.
                    N = labBroadcast(1);
                end
            
                % Form a string with host name, total number of labs, lab ID, and broadcast value.
                [c name] =system('hostname');
                name = name(1:length(name)-1);
                fmt = num2str(floor(log10(numlabs))+1);
                str = sprintf(['%s:%d:%' fmt 'd:%d   '], name,numlabs,labindex,N);
            
                % Apply global concatenate to all str's.
                % Store the concatenation of str's in the first dimension (row) and on lab #1.
                rslt = gcat(str,1,1);
            
                end   % spmd
                result = rslt{1};
                matlabpool close force;
            
            end   % function
            

            Also, prepare a job submission, here named myjob.sub. Run with the name of either the script M-file or the function M-file::

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # -nodisplay: run MATLAB in text mode; X11 server not needed
            # -r:         read MATLAB program; use MATLAB JIT Accelerator
            matlab -nodisplay -r myscript
            

            Run MATLAB to set the default parallel configuration to your PBS configuration:

            $ matlab -nodisplay
            >> defaultParallelConfig('mypbsconfig');
            >> quit;
            $
            

            Submit the job as a single compute node with one processor core and request one PCT license:

            $ qsub -l nodes=1:ppn=1,walltime=00:05:00,gres=Parallel_Computing_Toolbox+1%MATLAB_Distrib_Comp_Server+4 myjob.sub
            

            This job submission causes a second job submission.

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            465534.carter-adm  myusername   standby  myjob.sub    5620   1   1    --  00:05 R 00:00
            
            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            465534.carter-adm  myusername   standby  myjob.sub    5620   1   1    --  00:05 R 00:00
            465545.carter-adm  myusername   standby  Job2          --    4   4    --  00:01 R   --
            

            At first, job status shows one processor core (TSK) on one compute node (NDS). Then, job status shows four processor cores (TSK) on four compute nodes (NDS).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-a006.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            >Starting matlabpool using the 'mypbsconfig' configuration ... connected to 4 labs.
            Lab 1:
              carter-a006.rcac.purdue.edu:4:1:1000
              carter-a007.rcac.purdue.edu:4:2:1000
              carter-a008.rcac.purdue.edu:4:3:1000
              carter-a009.rcac.purdue.edu:4:4:1000
            Sending a stop signal to all the labs ... stopped.
            Did not find any pre-existing parallel jobs created by matlabpool.
            

            Output shows the name of one compute node (a006) that processed the job submission file myjob.sub. The job submission scattered four processor cores (four MATLAB labs) among four different compute nodes (a006,a007,a008,a009) that processed the four parallel regions. Output also shows that the value of variable numlabs is the number of labs (4) and that the program assigned to each lab a unique value for variable labindex. There are four labs, so there are four lab IDs. Each lab received the broadcast value: 1,000. Function gcat() collected in Lab 1 and from each parallel region the name of the compute node.

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job. Secondly, increase the wall time of mypbsconfig by using the Cluster Profile Manager in the Parallel menu to enter a new wall time in the property SubmitArguments.

            For more information about parallel jobs:

          • 7.1.9.13.15  MATLAB GPGPU Program (datatype gpuArray)

            7.1.9.13.15  MATLAB GPGPU Program (datatype gpuArray)

            The MATLAB datatype gpuArray, a feature of the Parallel Computing Toolbox, and MATLAB functions overloaded to handle this datatype let you accelerate your MATLAB applications with CUDA GPGPU computing technology more easily than using CUDA code in C or Fortran programs. You do not have to learn the intricacies of GPGPU architectures or low-level GPGPU computing libraries. This section illustrates how to access the massive, fine-grained parallelism of a GPGPU from MATLAB. Areas of application include SIMT (Single Instruction, Multiple Threads) problems.

            You use GPGPUs with MATLAB through MATLAB's Parallel Computing Toolbox (PCT). This method, when executed, uses the MATLAB interpreter and the Parallel Computing Toolbox; so, it requires and checks out two licenses: one MATLAB license for the client running on the compute node and one PCT license. The MATLAB license remains active between starting and quitting MATLAB. The PCT license remains active between running a PCT function, such as gpuDeviceCount(), and quitting MATLAB.

            This section illustrates a method about submitting a small, MATLAB program with a parallel region run on a GPGPU as a batch, MATLAB job to a PBS queue.

            A simple MATLAB/GPGPU program has a basic workflow:

            • 1) Initialize an array on the host (CPU).
            • 2) Copy array from CPU memory to GPGPU memory.
            • 3) Apply an operation to array on GPGPU.
            • 4) Copy array from GPGPU memory to CPU memory.

            Prepare a MATLAB script M-file which times the MATLAB function fft() running on a CPU and a GPGPU. Use an appropriate filename, here named myscript.m:

            Prepare a job submission file with an appropriate filename, here named myjob.sub. Run MATLAB with and without the command-line option -singleCompThread:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            echo "myjob.sub"
            hostname
            
            module load matlab
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # -nodisplay:        run MATLAB in text mode; X11 server not needed
            # -singleCompThread: turn off implicit parallelism
            # -r:                read MATLAB program; use MATLAB JIT Accelerator
            matlab -nodisplay -singleCompThread -r myscript
            # matlab -nodisplay -r myscript
            

            Submit the job to a queue with compute nodes loaded with GPGPUs:

            qsub -q standby-g -l nodes=1,walltime=00:00:30 myjob.sub
            

            View job status:

            $ qstat -u myusername
            
            carter-adm.rcac.purdue.edu:
                                                                               Req'd  Req'd   Elap
            Job ID             Username     Queue    Jobname    SessID NDS TSK Memory Time  S Time
            ------------------ ----------   -------- ---------- ------ --- --- ------ ----- - -----
            182955.carter-adm  myusername   standby- myjob.sub    4181   1   1    --  00:00 R   --
            

            Job status shows one compute node (NDS) with one processor core (TSK).

            View results in the file for all standard output, myjob.sub.omyjobid:

            myjob.sub
            carter-g000.rcac.purdue.edu
            
                                        < M A T L A B (R) >
                              Copyright 1984-2011 The MathWorks, Inc.
                                R2011b (7.13.0.564) 64-bit (glnxa64)
                                          August 13, 2011
            
            
            To get started, type one of these: helpwin, helpdesk, or demo.
            For product information, visit www.mathworks.com.
            
            Number of GPU devices present:  1
            
              parallel.gpu.CUDADevice handle
              Package: parallel.gpu
            
              Properties:
                                  Name: 'Tesla M2070'
                                 Index: 1
                     ComputeCapability: '2.0'
                        SupportsDouble: 1
                         DriverVersion: 4.2000
                    MaxThreadsPerBlock: 1024
                      MaxShmemPerBlock: 49152
                    MaxThreadBlockSize: [1024 1024 64]
                           MaxGridSize: [65535 65535]
                             SIMDWidth: 32
                           TotalMemory: 5.6366e+09
                            FreeMemory: 5.5529e+09
                   MultiprocessorCount: 14
                          ClockRateKHz: 1147000
                           ComputeMode: 'Default'
                  GPUOverlapsTransfers: 1
                KernelExecutionTimeout: 0
                      CanMapHostMemory: 1
                       DeviceSupported: 1
                        DeviceSelected: 1
            
            
            
            
            myscript.m:  hostname:carter-g000.rcac.purdue.edu
            
            Implicit Parallelism:                              off           on
                                                       -singleCompThread
                                                       -----------------  --------
            Elapsed time in FFT running on a CPU:           0.367362      0.339608
            Elapsed time in FFT running on a GPU:           0.080177      0.065205
            

            Output shows that a processor core on one compute node (g000) processed the entire job. One processor core processed myjob.sub and myscript.m. ComputeCapability is level 2.0, and SupportsDouble is true. Output also displays the combined results of two separate runs: with and without the command-line option -singleCompThread. Runtime on the GPGPU is less than the runtime on the CPU.

            To scale up this method to handle a real application, increase the wall time in the qsub command to accommodate a longer running job.

            For more information about programming MATLAB for the GPGPU:

          • 7.1.9.13.16  Octave (Interpreting an M-file)

            7.1.9.13.16  Octave (Interpreting an M-file)

            GNU Octave is a high-level, interpreted, programming language for numerical computations. The Octave interpreter is the part of Octave which reads M-files, oct-files, and MEX-files and executes Octave statements. Octave is a structured language (similar to C) and mostly compatible with MATLAB. You may use Octave to avoid the need for a MATLAB license, both during development and as a deployed application. By doing so, you may be able to run your application on more systems or more easily distribute it to others.

            This section illustrates how to submit a small Octave job to a PBS queue. This Octave example computes the inverse of a matrix.

            Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

            % FILENAME:  myjob.m
            
            % Invert matrix A.
            A = [1 2 3; 4 5 6; 7 8 0]
            inv(A)
            
            quit
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load octave
            cd $PBS_O_WORKDIR
            
            unset DISPLAY
            
            # Use the -q option to suppress startup messages.
            # octave -q < myjob.m
            octave < myjob.m
            

            The command octave myjob.m (without the redirection) also works in the preceding script.

            OR:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load octave
            
            unset DISPLAY
            
            # Use the -q option to suppress startup messages.
            # octave -q << EOF
            octave << EOF
            
            % Invert matrix A.
            A = [1 2 3; 4 5 6; 7 8 0]
            inv(A)
            
            quit
            EOF     % end of Octave commands
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            A =
            
               1   2   3
               4   5   6
               7   8   0
            
            ans =
            
              -1.77778   0.88889  -0.11111
               1.55556  -0.77778   0.22222
              -0.11111   0.22222  -0.11111
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about Octave:

          • 7.1.9.13.17  Octave Compiler (Compiling an M-file)

            7.1.9.13.17  Octave Compiler (Compiling an M-file)

            Octave does not offer a compiler to translate an M-file into an executable file for additional speed or distribution. You may wish to consider recoding an M-file as either an oct-file or a stand-alone program.

          • 7.1.9.13.18  Octave Executable (Oct-file)

            7.1.9.13.18  Octave Executable (Oct-file)

            An oct-file is an "Octave Executable". It offers a way for Octave code to call functions written in C, C++, or Fortran as though these external functions were built-in Octave functions. You may wish to use an oct-file if you would like to call an existing C, C++, or Fortran function directly from Octave rather than reimplementing that code as an Octave function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than Octave, you may be able to substantially improve performance over Octave source code, especially for statements like for and while.

            This section illustrates how to submit a small Octave job with an oct-file to a PBS queue. This Octave example calls a C function which adds two matrices.

            Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, int n) {
                int i;
            
                /* Component-wise addition. */
                for (i=0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            

            Combine the computational routine with an oct-file, which contains the necessary external function interface of Octave. The name of the file is matrixSum.cc:

             * FILENAME:  matrixSum.cc
             *
             * Adds two MxN arrays (inMatrix).
             * Outputs one MxN array (outMatrix).
             *
             * The calling syntax is:
             *
             *      matrixSum (inMatrix, inMatrix, outMatrix, size)
             *
             * This is an oct-file for Octave.
             *
             **********************************************************/
            
            #include <octave/oct.h>
            
            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, int n) {
                int i;
            
                /* Component-wise addition. */
                for (i=0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            
            /* Gateway Function */
            DEFUN_DLD (matrixSum, args, nargout, "matrixSum: A + B") {
            
                NDArray inMatrix_a;                /* mxn input matrix   */
                NDArray inMatrix_b;                /* mxn input matrix   */
                int nrows_a,ncols_a;               /* size of matrix a   */
                int nrows_b,ncols_b;               /* size of matrix b   */
                NDArray outMatrix_c;               /* mxn output matrix  */
            
                /* Check for proper number of input arguments */
                if (args.length() != 2) {
                   printf("matrixSum:  two inputs required.");
                   exit(-1);
                }
                /* Check for proper number of output arguments */
                if (nargout != 1) {
                   printf("matrixSum:  one output required.");
                   exit(-1);
                }
            
                /* Check that both input matrices are real matrices. */
                if (!args(0).is_real_matrix()) {
                   printf("matrixSum:  expecting LHS (arg 1) to be a real matrix");
                   exit(-1);
                }
                if (!args(1).is_real_matrix()) {
                   printf("matrixSum:  expecting RHS (arg 2) to be a real matrix");
                   exit(-1);
                }
            
                /* Get dimensions of the first input matrix */
                nrows_a = args(0).rows();
                ncols_a = args(0).columns();
                /* Get dimensions of the second input matrix */
                nrows_b = args(1).rows();
                ncols_b = args(1).columns();
            
                /* Check for equal number of rows. */
                if(nrows_a != nrows_b) {
                   printf("matrixSum:  unequal number of rows.");
                   exit(-1);
                }
                /* Check for equal number of columns. */
                if(ncols_a != ncols_b) {
                   printf("matrixSum:  unequal number of rows.");
                   exit(-1);
                }
            
                /* Make a pointer to the real data in the first input matrix  */
                inMatrix_a = args(0).array_value();
                /* Make a pointer to the real data in the second input matrix  */
                inMatrix_b = args(1).array_value();
            
                /* Construct output matrix as a copy of the first input matrix. */
                outMatrix_c = args(0).array_value();
            
                /* Call the computational routine.  */
                double* ptr_a = inMatrix_a.fortran_vec();
                double* ptr_b = inMatrix_b.fortran_vec();
                double* ptr_c = outMatrix_c.fortran_vec();
                matrixSum(ptr_a,ptr_b,ptr_c,nrows_a*ncols_a);
            
                return octave_value(outMatrix_c);
            }
            

            To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

            $ module load octave
            

            To compile matrixSum.cc into an oct-file:

            $ mkoctfile matrixSum.cc
            

            Two new files appear after the compilation:

            matrixSum.o
            matrixSum.oct
            

            The name of the Octave-callable oct-file is matrixSum.oct.

            Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

            % FILENAME:  myjob.m
            
            % Call the separately compiled and dynamically linked oct-file.
            A = [1,1,1;1,1,1]
            B = [2,2,2;2,2,2]
            C = matrixSum(A,B)
            
            quit
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load octave
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # Use the -q option to suppress startup messages.
            # octave -q < myjob.m
            octave < myjob.m
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            A =
            
               1   1   1
               1   1   1
            
            B =
            
               2   2   2
               2   2   2
            
            C =
            
               3   3   3
               3   3   3
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about the Octave oct-file:

          • 7.1.9.13.19  Octave Standalone Program

            7.1.9.13.19  Octave Standalone Program

            A stand-alone Octave program is a C, C++, or Fortran program which calls user-written oct-files and the same libraries that Octave uses. A stand-alone program has access to Octave objects, such as the array and matrix classes, as well as all the Octave algorithms. If you would like to implement performance-critical routines in C, C++, or Fortran and still call select Octave functions, a stand-alone Octave program may be a good option. This offers the possibility for substantially improved performance over Octave source code, especially for statements like for and while while still allowing use of specialized Octave functions where useful.

            This section illustrates how to submit a small, stand-alone Octave program to a PBS queue. This C++ example uses class Matrix and calls an Octave script which prints a message.

            Prepare an Octave-compatible M-file with an appropriate filename, here named hello.m:

            % FILENAME:  hello.m
            
            disp('hello.m:    hello, world')
            

            Prepare a C++ function file with the necessary external function interface and with an appropriate filename, here named hello.cc:

            // FILENAME:  hello.cc
            
            #include <iostream>
            #include <octave/oct.h>
            #include <octave/octave.h>
            #include <octave/parse.h>
            #include <octave/toplev.h> /* do_octave_atexit */
            
            int main (const int argc, char ** argv) {
            
                const char * argvv [] = {"" /* name of program, not relevant */, "--silent"};
                octave_main (2, (char **) argvv, true /* embedded */);
            
                std::cout << "hello.cc:   hello, world" << std::endl;
            
                const octave_value_list result = feval ("hello");  /* invoke hello.m */
            
                int n = 2;
                Matrix a_matrix = Matrix (1,2);
                a_matrix (0,0) = 888;
                a_matrix (0,1) = 999;
                std::cout << "hello.cc:   " << a_matrix;
            
                do_octave_atexit ();
            
            }
            

            To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

            $ module load octave
            

            To compile the stand-alone Octave program:

            $ mkoctfile --link-stand-alone hello.cc -o hello
            

            Two new files appear after the compilation:

            hello
            hello.o
            

            The name of the compiled, stand-alone Octave program is hello.

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load gcc
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            hello
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            hello.cc:   hello, world
            hello.m:    hello, world
            hello.cc:    888 999
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about the Octave stand-alone program:

          • 7.1.9.13.20  Octave (MEX-file)

            7.1.9.13.20  Octave (MEX-file)

            MEX stands for "MATLAB Executable". A MEX-file offers a way for MATLAB code to call functions written in C, C++ or Fortran as though these external functions were built-in MATLAB functions. You may wish to use a MEX-file if you would like to call an existing C, C++, or Fortran function directly from MATLAB rather than reimplementing that code as a MATLAB function. Also, by implementing performance-critical routines in C, C++, or Fortran rather than MATLAB, you may be able to substantially improve performance over MATLAB source code, especially for statements like for and while.

            Octave includes an interface which can link compiled, legacy MEX-files. This interface allows sharing code between Octave and MATLAB users. In Octave, an oct-file will always perform better than a MEX-file, so you should write new code using the oct-file interface, if possible. However, you may test a new MEX-file in Octave then use it in a MATLAB application.

            This section illustrates how to submit a small Octave job with a MEX-file to a PBS queue. This Octave example calls a C function which adds two matrices.

            Prepare a complicated and time-consuming computation in the form of a C, C++, or Fortran function. In this example, the computation is a C function which adds two matrices:

            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, int n) {
                int i;
            
                /* Component-wise addition. */
                for (i=0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            

            Combine the computational routine with a MEX-file, which contains the necessary external function interface of MATLAB. In the computational routine, change int to mwSize. The name of the file is matrixSum.c:

            /*************************************************************
             * FILENAME:  matrixSum.c
             *
             * Adds two MxN arrays (inMatrix).
             * Outputs one MxN array (outMatrix).
             *
             * The calling syntax is:
             *
             *      matrixSum(inMatrix, inMatrix, outMatrix, size)
             *
             * This is a MEX-file which Octave will execute.
             *
             **************************************************************/
            
            #include "mex.h"
            
            /* Computational Routine */
            void matrixSum (double *a, double *b, double *c, mwSize n) {
                mwSize i;
            
                /* Component-wise addition. */
                for (i=0; i<n; i++) {
                    c[i] = a[i] + b[i];
                }
            }
            
            /* Gateway Function */
            void mexFunction (int nlhs, mxArray *plhs[],
                              int nrhs, const mxArray *prhs[]) {
            
                double *inMatrix_a;               /* mxn input matrix  */
                double *inMatrix_b;               /* mxn input matrix  */
                mwSize nrows_a,ncols_a;           /* size of matrix a  */
                mwSize nrows_b,ncols_b;           /* size of matrix b  */
                double *outMatrix_c;              /* mxn output matrix */
            
                /* Check for proper number of arguments */
                if(nrhs!=2) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:nrhs","Two inputs required.");
                }
                if(nlhs!=1) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:nlhs","One output required.");
                }
            
                /* Get dimensions of the first input matrix */
                nrows_a = mxGetM(prhs[0]);
                ncols_a = mxGetN(prhs[0]);
                /* Get dimensions of the second input matrix */
                nrows_b = mxGetM(prhs[1]);
                ncols_b = mxGetN(prhs[1]);
            
                /* Check for equal number of rows. */
                if(nrows_a != nrows_b) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of rows.");
                }
                /* Check for equal number of columns. */
                if(ncols_a != ncols_b) {
                    mexErrMsgIdAndTxt("MyToolbox:matrixSum:notEqual","Unequal number of columns.");
                }
            
                /* Make a pointer to the real data in the first input matrix  */
                inMatrix_a = mxGetPr(prhs[0]);
                /* Make a pointer to the real data in the second input matrix  */
                inMatrix_b = mxGetPr(prhs[1]);
            
                /* Make the output matrix */
                plhs[0] = mxCreateDoubleMatrix(nrows_a,ncols_a,mxREAL);
            
                /* Make a pointer to the real data in the output matrix */
                outMatrix_c = mxGetPr(plhs[0]);
            
                /* Call the computational routine */
                matrixSum(inMatrix_a,inMatrix_b,outMatrix_c,nrows_a*ncols_a);
            }
            

            To access the Octave utility mkoctfile, load an Octave module. Loading Octave also loads a compatible GCC:

            $ module load octave
            

            To compile matrixSum.c into a MEX-file:

            $ mkoctfile --mex matrixSum.c
            

            Two new files appear after the compilation:

            matrixSum.mex
            matrixSum.o
            

            The name of the Octave-callable MEX-file is matrixSum.mex.

            Prepare an Octave-compatible M-file with an appropriate filename, here named myjob.m:

            % FILENAME:  myjob.m
            
            % Call the separately compiled and dynamically linked oct-file.
            A = [1,1,1;1,1,1]
            B = [2,2,2;2,2,2]
            C = matrixSum(A,B)
            
            quit
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load octave
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # Use the -q option to suppress startup messages.
            # octave -q < myjob.m
            octave < myjob.m
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            A =
            
               1   1   1
               1   1   1
            
            B =
            
               2   2   2
               2   2   2
            
            C =
            
               3   3   3
               3   3   3
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about the Octave-compatible Mex-file:

          • 7.1.9.13.21  Perl

            7.1.9.13.21  Perl

            Perl is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. This section illustrates how to submit a small Perl job to a PBS queue. This Perl example prints a single line of text.

            Prepare a Perl input file with an appropriate filename, here named myjob.in:

            # FILENAME:  myjob.in
            
            print "hello, world\n"
            

            Discover the absolute path of Perl:

            $ which perl
            /usr/local/bin/perl
            

            There is a second absolute path: /usr/bin/perl.

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            # Use the -w option to issue warnings.
            /usr/bin/perl -w myjob.in
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            hello, world
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            For more information about Perl:

          • 7.1.9.13.22  Python

            7.1.9.13.22  Python

            Python is a high-level, general-purpose, interpreted, dynamic programming language offering powerful text processing features. We suggest using Anaconda which is a Python distribution made for large-scale data processing, predictive analytics, and scientific computing. This section illustrates how to submit a small Python job to a PBS queue. This Python example prints a single line of text.

            Prepare a Python input file with an appropriate filename, here named myjob.in:

            # FILENAME:  myjob.in
            
            import string, sys
            print "hello, world"
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load anaconda
            cd $PBS_O_WORKDIR
            unset DISPLAY
            
            python myjob.in
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            hello, world
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            If you would like to install a python package for your own personal use, you may do so by following these directions. Make sure you have a download link to the software you want to use and substitute it on the wget line.

            $ mkdir ~/src
            $ cd ~/src
            $ wget http://path/to/source/tarball/app-1.0.tar.gz
            $ tar xzvf app-1.0.tar.gz
            $ cd app-1.0
            $ module load anaconda
            $ python setup.py install --user
            $ cd ~
            $ python
            >>> import app
            >>> quit()
            

            The "import app" line should return without any output if installed successfully. You can then import the package in your python scripts.

            For more information about Python:

            For a list of modules currently installed in the anaconda python distribution:

            $ module load anaconda
            $ conda list
            # packages in environment at /apps/rhel6/Anaconda-2.0.1:
            #
            _license                  1.1                      py27_0
            anaconda                  2.0.1                np18py27_0
            ...
              

            If any other python modules are needed please contact us.

          • 7.1.9.13.23  R

            7.1.9.13.23  R

            R, a GNU project, is a language and environment for statistics and graphics. It is an open source version of the S programming language. This section illustrates how to submit a small R job to a PBS queue. This R example computes a Pythagorean triple.

            Prepare an R input file with an appropriate filename, here named myjob.in:

            # FILENAME:  myjob.in
            
            # Compute a Pythagorean triple.
            a = 3
            b = 4
            c = sqrt(a*a + b*b)
            c     # display result
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load r
            cd $PBS_O_WORKDIR
            
            # --vanilla:
            # --no-save: do not save datasets at the end of an R session
            R --vanilla --no-save < myjob.in
            

            OR:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load r
            
            # --vanilla:
            # --no-save: do not save datasets at the end of an R session
            R --vanilla --no-save << EOF
            
            # Compute a Pythagorean triple.
            a = 3
            b = 4
            c = sqrt(a*a + b*b)
            c     # display result
            EOF
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

            Warning: no access to tty (Bad file descriptor).
            Thus no job control in this shell.
            
            R version 2.9.0 (2009-04-17)
            Copyright (C) 2009 The R Foundation for Statistical Computing
            ISBN 3-900051-07-0
            
            R is free software and comes with ABSOLUTELY NO WARRANTY.
            You are welcome to redistribute it under certain conditions.
            Type 'license()' or 'licence()' for distribution details.
            
            R is a collaborative project with many contributors.
            Type 'contributors()' for more information and
            'citation()' on how to cite R or R packages in publications.
            
            Type 'demo()' for some demos, 'help()' for on-line help, or
            'help.start()' for an HTML browser interface to help.
            Type 'q()' to quit R.
            
            > # FILENAME:  myjob.in
            >
            > # Compute a Pythagorean triple.
            > a = 3
            > b = 4
            > c = sqrt(a*a + b*b)
            > c     # display result
            [1] 5
            >
            

            Any output written to standard error will appear in myjob.sub.emyjobid.

            To install additional R packages, create a folder in your home directory called Rlibs. You will need to be running a recent version of R (2.14.0 or greater as of this writing):

            $ mkdir ~/Rlibs

            If you are running the bash shell (the default on our clusters), add the following line to your .bashrc (Create the file ~/.bashrc if it doesn't already exist. You may also need to run "ln -s .bashrc .bash_profile" if .bash_profile doesn't exist either):

            export R_LIBS=~/Rlibs:$R_LIBS

            If you are running csh or tcsh, add the following to your .cshrc:

            setenv R_LIBS ~/Rlibs:$R_LIBS

            Now run "source .bashrc" and start R:

            $ module load r
            $ R
            > .libPaths()
            [1] "/home/myusername/Rlibs"
            [2] "/apps/rhel5/R-2.14.0/lib64/R/library"
            

            .libPaths() should output something similar to above if it is set up correctly. Now let's try installing a package.

            > install.packages('packagename',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN")

            The above command should download and install the requested R package, which upon completion can then be loaded.

            > library('packagename')

            If your R package relies on a library that's only installed as a module (for this example we'll use GDAL), you can install it by doing the following:

            $ module load gdal
            $ module load r
            $ R
            > install.packages('rgdal',"~/Rlibs","http://streaming.stat.iastate.edu/CRAN", configure.args="--with-gdal-include=$GDAL_HOME/include
            --with-gdal-lib=$GDAL_HOME/lib"))
            

            Repeat install.packages(...) for any packages that you need. Your R packages should now be installed.

            For more information about R:

          • 7.1.9.13.24  SAS

            7.1.9.13.24  SAS

            SAS (pronounced "sass") is an integrated system supporting statistical analysis, report generation, business planning, and forecasting. This section illustrates how to submit a small SAS job to a PBS queue. This SAS example displays a small dataset.

            Prepare a SAS input file with an appropriate filename, here named myjob.sas:

            * FILENAME:  myjob.sas
            
            /* Display a small dataset. */
            TITLE 'Display a Small Dataset';
            DATA grades;
            INPUT name $ midterm final;
            DATALINES;
            Anne     61 64
            Bob      71 71
            Carla    86 80
            David    79 77
            Edwardo  73 73
            Fannie   81 81
            ;
            PROC PRINT data=grades;
            RUN;
            

            Prepare a job submission file with an appropriate filename, here named myjob.sub:

            #!/bin/sh -l
            # FILENAME:  myjob.sub
            
            module load sas
            cd $PBS_O_WORKDIR
            
            # -stdio:   run SAS in batch mode:
            #              read SAS input from stdin
            #              write SAS output to stdout
            #              write SAS log to stderr
            # -nonews:  do not display SAS news
            # SAS runs in batch mode when the name of the SAS command file
            # appears as a command-line argument.
            sas -stdio -nonews myjob
            

            Submit the job:

            $ qsub -l nodes=1 myjob.sub
            

            View job status:

            $ qstat -u myusername
            

            View results in the file for all standard output, myjob.sub.omyjobid:

                                                                       The SAS System                       10:59 Wednesday, January 5, 2011   1
            
                                                             Obs    name       midterm    final
            
                                                              1     Anne          61        64
                                                              2     Bob           71        71
                                                              3     Carla         86        80
                                                              4     David         79        77
                                                              5     Edwardo       73        73
                                                              6     Fannie        81        81
            

            View the SAS log in the standard error file, myjob.sub.emyjobid:

            1                                                          The SAS System                           12:32 Saturday, January 29, 2011
            
            NOTE: Copyright (c) 2002-2008 by SAS Institute Inc., Cary, NC, USA.
            NOTE: SAS (r) Proprietary Software 9.2 (TS2M0)
                  Licensed to PURDUE UNIVERSITY - T&R, Site 70063312.
            NOTE: This session is executing on the Linux 2.6.18-194.17.1.el5rcac2 (LINUX) platform.
            
            
            
            NOTE: SAS initialization used:
                  real time           0.70 seconds
                  cpu time            0.03 seconds
            
            1          * FILENAME:  myjob.sas
            2
            3          /* Display a small dataset. */
            4          TITLE 'Display a Small Dataset';
            5          DATA grades;
            6          INPUT name $ midterm final;
            7          DATALINES;
            
            NOTE: The data set WORK.GRADES has 6 observations and 3 variables.
            NOTE: DATA statement used (Total process time):
                  real time           0.18 seconds
                  cpu time            0.01 seconds
            
            
            14         ;
            15         PROC PRINT data=grades;
            16         RUN;
            
            NOTE: There were 6 observations read from the data set WORK.GRADES.
            NOTE: The PROCEDURE PRINT printed page 1.
            NOTE: PROCEDURE PRINT used (Total process time):
                  real time           0.32 seconds
                  cpu time            0.04 seconds
            
            
            NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
            NOTE: The SAS System used:
                  real time           1.28 seconds
                  cpu time            0.08 seconds
            

            For more information about SAS:

    • 7.2  Running Jobs via HTCondor

      7.2  Running Jobs via HTCondor

      HTCondor allows you to run jobs on systems which would otherwise be idle for however long their primary users do not need those systems. HTCondor is one of several distributed computing systems which ITaP makes available. Most ITaP research resources, in addition to being available through normal means, are a part of BoilerGrid and are accessible via HTCondor. If a primary user needs a processor core on a compute node, HTCondor immediately either checkpoints and/or migrates all HTCondor jobs on that compute node and makes that resource available to the primary user. Thus, shorter jobs will have a better completion rate via HTCondor than longer jobs; however, even though HTCondor may have to restart jobs elsewhere, BoilerGrid can offer a vast amount of computational resources to users. Not only are nearly all ITaP research systems part of BoilerGrid, so also are large numbers of lab machines at the West Lafayette and other Purdue campuses. BoilerGrid is one of the largest HTCondor pools in the world. Some machines at other institutions are also a part of a larger HTCondor federation known as DiaGrid and are available as well.

      For more information: