The Radon cluster is composed of desktop PCs recycled from instructional computing labs. Radon is currently entirely 64-bit Dell systems with Intel Pentium4 or Xeon processors of various speeds and with memory configurations between 2 and 4 GB of RAM. Nodes are connected with either 100 MB or Gigabit Ethernet. The machines reclaimed from instructional labs are older, slower, and lack high-speed interconnects, so high-communication or sizeable multithreaded programs are not a good fit. Still, there are a fair number of machines, and some codes may be able to take advantage of these effectively.
Radon is currently divided into three different sub-clusters, each with a different combination of CPU speed, memory, and interconnect. Subcluster "a" nodes have 3.6 GHz single-core Intel Pentium4 CPUs, 2 GB RAM, and Gigabit Ethernet; subcluster "b" nodes, 3.2 GHz dual-core Intel Xeon CPUs, 2 GB RAM, and Gigabit Ethernet; and subcluster "c" nodes, 3.2 GHz dual-core Intel Xeon CPUs, 4 GB RAM, and Gigabit Ethernet.
| Sub-Cluster | Number of Nodes | Processor | Cores per Node | Memory per Node | Interconnect | Theoretical Peak TeraFLOPS |
|---|---|---|---|---|---|---|
| radon-a[001-144] | 144 | 3.2 GHz Intel Pentium4 | 1 | 2 GB | Gigabit Ethernet | 1.03 |
| radon-b[001-048] | 48 | 3.2 GHz Intel Xeon | 2 | 2 GB | Gigabit Ethernet | 0.61 |
| radon-c[001-048] | 48 | 3.2 GHz Intel Xeon | 2 | 4 GB | Gigabit Ethernet | 0.61 |
All Radon nodes run Red Hat Enterprise Linux 5 and use PBSPro 9.x for resource and job management. Operating system patches are applied monthly or as security needs dictate. All nodes have been configured to allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).
Radon is a cluster operated by RCAC. Purdue faculty, staff, and students with the approval of their advisor may request access to Radon using the online Research Computing Account Request Form.
To issue jobs on Radon, users may log on to the front-end host radon.rcac.purdue.edu via SSH.
All access to the RCAC systems must be through secure (encrypted) connections. Standard telnet and FTP are not supported. SSH, SCP, and SFTP may be used instead.
Secure Shell or SSH is a way of establishing a secure channel between a local and a remote computer. It uses public-key cryptography to authenticate the remote computer and (optionally) to allow the remote computer to authenticate the user. It is usually used to log in to a remote machine and execute commands similar to telnet, but it also supports tunneling and forwarding of X11 or arbitrary TCP connections. The associated SFTP and SCP protocols may be used to transfer files. There are many SSH clients available, depending on the operating system you use.
Linux / Solaris / AIX / HP-UX / Unix:
Microsoft Windows:
Mac OS X:
SSH can be used in conjunction with many different means of authentication. One popular authentication method is called Public Key Authentication (PKA). PKA is a method of establishing your identity to a remote computer using related sets of encryption data called keys. PKA is a more secure alternative to traditional password-based authentication with which you are probably familiar.
To employ PKA via SSH, you manually generate a keypair (also called SSH keys) in the location from where you wish to initiate a connection to a remote machine. This keypair consists of two text files, one which is called a private key and one which is called a public key. You keep the private key file confidential on your local machine or local home directory (hence the name "private" key). You then login to a remote machine (if possible) and append the corresponding public key text to the end of a specific file, or have a system administrator do so on your behalf. In future login attempts, the public and private keys are compared to verify your identity, which then grants you access to the remote machine.
As a user, you can create, maintain, and employ as many keypairs as you wish. If you connect to a computational resource from your work laptop, your work desktop, and your home desktop, you can create and employ keypairs on each. You can also create multiple keypairs on a single local machine to serve different purposes, such as establishing access to different remote machines, or establishing different types of access to a single remote machine. In short, PKA via SSH offers a secure but flexible means of identifying yourself to all kinds computational resources.
When a you create a keypair, you are prompted to provide a passphrase for the private key. This passphrase is different than a password in a number of ways. First, a passphrase is, as the name implies, a phrase. It can include most types of characters, including spaces, and has no limits on length. Second, this passphrase is not transmitted to the remote machine for verification. It is used only to allow the use of your local private key and is specific to a specific local private key.
Perhaps you are wondering why you would need a private key passphrase at all when using PKA. If the private key is kept secure, why the need for a passphrase just to use it? Indeed, if the location of your private keys were always completely secure, a passphrase might not be needed. In reality, a number of situations could arise in which someone may improperly gain access to your private key files. In these situations, a passphrase offers another level of security for you, the user who created the keypair.
Think of the private key/passphrase combination as being analogous to your ATM card/PIN combination. The ATM card itself is the object that grants access to your important accounts, and as such, should be kept secure at all times—just as a private key should. But if you ever lose your wallet or your ATM card is stolen, you are glad that your PIN exists to offer you another level of protection. The same is true for a private key passphrase.
When you create a keypair, you should always provide a corresponding private key passphrase. For security purposes, avoid using phrases that would be guessed by automated programs (e.g. phrases that consist solely of words in English-language dictionaries). This passphrase can never be recovered if forgotten, so make note of it. There are only limited situations when the use of a non-passphrase-protected private key is warranted—conducting automated file backups is one such situation. If you need to use a non-passphrase-protected private key to conduct automated backups to Fortress, see the No-Passphrase SSH Keys section.
If you have received a default password as part of the process of obtaining your account, you should change it immediately when you log on for the first time. This can be done from any terminal/SSH session with the command "passwd". You will have the same password on all RCAC systems. If you change your password on any one RCAC system, it will change on all RCAC systems.
If you already have a Purdue career account, then you will initially be given the same userid and password as your career account. There is no need to change your career account password because you have received an account on RCAC systems.
There is not currently any requirement regarding how often you must change your password within RCAC, but for security reasons changing a password every six months, preferably every three months, is good practice.
All passwords should:
Never share your password with another user or make your password known to anyone else. Systems staff will NEVER ask for your password, by email or otherwise.
File storage options on RCAC systems include home directories, scratch file systems, /tmp, and long-term or permanent storage. Each of these have different performance and intended uses, and some vary from system to system as well. Home directories and long-term storage are backed up nightly, but scratch and /tmp are not and may be occasionally purged without warning. Below is more detail about each of these storage options.
Your home directory is the default directory you are placed in when you log in.
You should use this space for storing files you want to keep long term such as source code, scripts, input data sets, etc. It should also be used for files you want to keep and which you use often. The home directory will physically reside on the BlueArc NFS Server. You can find the path to your home directory by logging in, and typing pwd:
$ pwd /home/ba01/u103/myusername
The second component of the reply indicates the name of the host where your home directory physically resides. In this example, the home directory is on the RCAC home directory file server named "ba01" under area "u103". This will vary from person to person. Remember, you can always check where your home directory is located by doing a pwd command in your home directory.
Regardless of its physical location, your home directory and its contents are available on almost all the RCAC front-end hosts and their nodes via the Network File System (NFS). The only exception is Black.
Note that your home directory has a quota capping the size and/or number of files you may store within. For more information, refer to the Storage Quotas / Limits Section.
Only files which have been backed up overnight can be recovered. If you lose a file the same day you created it, it can NOT be recovered.
For files lost less than seven days ago, RCAC has implemented self-service file recovery. Backups of all your files are made at midnight daily and you may access these directly.
To recover files lost after midnight today (same day as loss):
$ set BACKUP=`echo $HOME | sed "s,/u1,_snap/backup_snap/u1,;s,/home/,/autohome/,"` $ cd $BACKUP (now locate the file or directory you wish to recover within here) $ cp mylostfile $HOME (or) $ cp -r mylostdir $HOME
To recover files lost prior to today, but in last week (2-7 day loss):
(set this to the date you lost the files: 4-digit year, 2-digit month, 2-digit day) $ set DATE=YYYYMMDD $ set BACKUP=`echo $HOME | sed "s,/u1,_snap/backup_snap_$DATE*/u1,;s,/home/,/autohome/,"` $ cd $BACKUP (now locate the file or directory you wish to recover within here) $ cp mylostfile $HOME (or) $ cp -r mylostdir $HOME
For files lost more than seven days ago, you will need to request RCAC recover your files from backup tapes. Please do so using the flost command from the front-end host of an RCAC resource:
$ flost
Scratch directories are provided by RCAC and are intended for short-term file storage only.
Backups are not performed on scratch directories. In the event of a disk crash or file purge, files in scratch directories can not be recovered. Please be sure to copy any important files to more permanent storage.
All files stored in RCAC scratch directories older than 90 days will be automatically removed (purged). Owners of these files will be notified one week before removal via email. For more information, please refer to our Scratch File Purging Policy.
RCAC scratch directories are provided by a central BlueArc server and are accessible from most RCAC systems. There are two primary scratch file systems: scratch95 and scratch96. A scratch directory already exists for all Radon users. Your RCAC scratch directory is located under scratch95 or scratch96 within a subdirectory by the first letter of your username.
To find the path to your RCAC scratch directory, run myscratch:
$ myscratch /scratch/scratch96/m/myusername
The variable $RCAC_SCRATCH is also set to your RCAC scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.
$ echo $RCAC_SCRATCH /scratch/scratch96/m/myusername
To find the path to someone else's RCAC scratch directory, use the command findscratch:
$ findscratch someuser /scratch/scratch95/s/someuser
Note that your RCAC scratch directory has a quota capping the size and/or number of files you may store within. For more information, refer to the Storage Quotas / Limits Section.
The /tmp directory is intended for temporary files that are used during the execution of a process or job or while you examine files created by your jobs. Used properly, /tmp may provide faster local storage to an active process than any other storage option. However, do not use it for longer-term storage or critical results.
Files stored in /tmp are not backed up and are removed whenever space is low or whenever the system is rebooted. In the event of a loss, files in /tmp can not be recovered, so use it only for files that can be recreated relatively easily.
Long-term Storage or Permanent Storage is available to RCAC users on the DXUL/UniTree archival storage system, commonly referred to as "Fortress". DXUL (DiskXtender for Unix and Linux) and UniTree are a software package that manages a hierarchical storage system. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has a 1.2 PB capacity. However, since two copies are retained for every file, the usable capacity is only 600 TB.
Recently used files smaller than 0.5 MB have their primary copy stored on low-cost disks, but the second copy is on tape or optical disks. This provides a rapid restore time to the disk cache. However, the large latency to access a larger file (usually involving a copy from a tape cartridge) makes it unsuitable for use as active storage.
In addition to poor performance, these two uses can cause severe problems with the system itself:
Do not use Fortress as a second home directory. Instead, use tar or some similar archive tool to combine all the smaller files you wish to store into a single large file first.
For active data storage you should use either local storage or a scratch file system. You may then copy any results you wish to archive to Fortress when computation is complete.
Fortress is directly accessible (via FTP, SSH, SCP, SFTP, and NFS) from all RCAC systems, as well as most systems in ECN and CS and from several other major servers on campus. To access Fortress in any way other than NFS, you must login to fortress.rcac.purdue.edu. RCAC has more information about Fortress, including how to obtain a Fortress account and how to access your files on Fortress.
There are many environment variables related to storage locations and paths which are automatically set for you upon log-in, or may be changed if necessary. In addition, many more environment variables are set for specific applications, such as compilers, when "modules" for these applications are loaded. (See the module command section for more information.)
Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change. Some of the environment variables you should have are:
All environment variables begin with the dollar sign ($) and are all uppercase. They may be used on the command line or in any scripts in place of and in combination with hard-coded values:
$ ls $HOME ... $ ls $RCAC_SCRATCH/myproject ... $ ls $RCAC_SCRATCH/myproject/$HOSTNAME_data ...
You may find the value of any environment variable by using the echo command:
$ echo $RCAC_SCRATCH /scratch/scratch95/m/myusername $ echo $SHELL /usr/local/bin/tcsh
You may list the values of all environment variables using the env command:
$ env USER=myusername HOME=/home/ba01/u101/myusername RCAC_SCRATCH=/scratch/scratch95/m/myusername SHELL=/usr/local/bin/tcsh ...
You may create or overwrite an environment variable using either export or setenv, depending on your shell:
(for bash and sh) $ export VARIABLE=value (for tcsh and csh) % setenv VARIABLE value
Your disk usage is limited on RCAC systems. However, each filesystem (scratch, home directory, etc.) may have a different limit. If you exceed the soft limit or quota, you will see warnings whenever writing to the disk that you are over quota, but the write will still succeed. If you exceed the hard limit or limit, your write will fail until you either remove other files or your quota is increased. Generally, RCAC systems do not impose a soft limit—only a hard limit.
You may find out what your current quota is by using the quota command:
$ quota
Disk quotas for user myusername (uid 12345):
Filesystem blocks quota limit grace files quota limit grace
ba01:/u103 2346272 0 5000000 17508 0 65535
The columns are as follows:
You may also see the disk usage of any given directory by using du:
$ du -hs 1.1G . $ du -hs $HOME 138M /home/ba01/u103/myusername
This can be very helpful in figuring out where your largest files or directories are, so that you may clean out unneeded large files and avoid hitting your quota.
If you find you need additional disk space on an RCAC account, please first consider archiving and compressing old files and moving them to long-term storage. If this option does not resolve the issue, you may send an email to rcac-help@purdue.edu and request additional space.
There are several options for archiving and compressing groups of files or directories on RCAC systems. All of the following tools are provided:
(compress file somefile.c) $ zip somefile.zip somefile.c (extract contents of somefile.zip) $ unzip somefile.zip (compress all files in a directory into one archive file) $ zip -r somefile.zip somedirectory/ (compress all ".c" files in current directory into one archive file) $ zip -r somefile.zip . -i \*.c
(archive file somefile.c) $ tar cvf somefile.tar somefile.c (archive and compress file somefile.c) $ tar czvf somefile.tar.gz somefile.c (list contents of archive somefile.tar) $ tar tvf somefile.tar (extract contents of somefile.tar) $ tar xvf somefile.tar (extract contents of gzipped archive somefile.tar.gz) $ tar xzvf somefile.tar.gz (archive and compress all files in a directory into one archive file) $ tar czvf somefile.tar.gz somedirectory/ (archive and compress all ".c" files in current directory into one archive file) $ tar czvf somefile.tar.gz *.c
(compress file somefile - also removes uncompressed file) $ gzip somefile (uncompress file somefile.gz - also removes compressed file) $ gunzip somefile.gz
(compress file somefile - also removes uncompressed file) $ bzip2 somefile (uncompress file somefile.bz2 - also removes compressed file) $ bunzip2 somefile.bz2
Windows users can work with these same formats using some of the following software:
There are a variety of ways to transfer data to and from RCAC systems. Which you should use depends on several factors, including the ease of use for you personally, connection speed and bandwidth, the size and number of files to be transferred. For more details on file transfer methods and applications, refer to the Radon Complete User Guide.
The third-party software on three commonly used RCAC systems is shown in the following table. Additional software may be available on other RCAC systems, and the software on a specific system can be seen by running the command "module avail" on that system. Please contact rcac-help@purdue.edu if you are interested in the availability of software not shown in this list.
| Radon | Steele | Julius/Caesar | |
| R | ✓ | ✓ | |
| AcGrace | ✓ | ||
| Amber | ✓ | ✓ | |
| ANSYS | ✓ | ✓ | |
| ATLAS | ✓ | ||
| BinUtils | ✓ | ||
| Boost | ✓ | ||
| ClustalX | ✓ | ||
| COMSOL | ✓ | ✓ | ✓ |
| CPLEX | ✓ | ✓ | ✓ |
| CUDA | ✓ | ||
| DX | ✓ | ||
| Ferret | ✓ | ||
| FFTW | ✓ | ✓ | |
| FLUENT | ✓ | ✓ | |
| GAMESS | ✓ | ✓ | |
| GAMS | ✓ | ✓ | |
| Gaussian | ✓ | ✓ | ✓ |
| GCC Compiler (C, C++, Fortran) | ✓ | ✓ | ✓ |
| GCC IA64 Cross-Compiler (xgcc-ia64) | ✓ | ||
| GMP | ✓ | ||
| GMT | ✓ | ✓ | |
| GrADS | ✓ | ||
| GROMACS | ✓ | ||
| GhostScript | ✓ | ||
| GSL | ✓ | ✓ | |
| HDF4 (Compiled for Intel, GNU, PGI) | ✓ | ||
| HDF5 (Compiled for Intel, GNU, PGI) | ✓ | ||
| ImageMagick | ✓ | ||
| IMSL | ✓ | ✓ | ✓ |
| Intel Compiler (C, C++, Fortran) | ✓ | ✓ | ✓ |
| Jasper | ✓ | ✓ | |
| Java | ✓ | ✓ | |
| LAM | ✓ | ||
| LAMMPS | ✓ | ✓ | |
| LSTC | ✓ | ||
| Maple | ✓ | ✓ | ✓ |
| Mathematica | ✓ | ✓ | ✓ |
| MATLAB | ✓ | ✓ | |
| Mitrionics FPGA Tools (mitrion) | ✓ | ||
| MPFR | ✓ | ✓ | ✓ |
| MPICH | ✓ | ✓ | ✓ |
| MPICH2 | ✓ | ✓ | |
| MPIExec | ✓ | ||
| MrBayes | ✓ | ||
| MUMPS | ✓ | ||
| MVAPICH (for Intel, PGI compilers) | ✓ | ||
| MVAPICH2 (for Intel, GNU, PGI compilers) | ✓ | ||
| MWRank | ✓ | ||
| NCBI | ✓ | ||
| NCL | ✓ | ✓ | |
| NCO | ✓ | ||
| NetCDF (for Intel, GNU, PGI compilers) | ✓ | ||
| NTL | ✓ | ||
| NWChem | ✓ | ✓ | |
| Octave | ✓ | ||
| PGI Compiler (C, C++, Fortran) | ✓ | ✓ | |
| PKG-Config | ✓ | ||
| Python | ✓ | ✓ | |
| RASC | ✓ | ✓ | |
| SAS | ✓ | ||
| ScaLAPACK | ✓ | ✓ | ✓ |
| Stata | ✓ | ||
| Subversion | ✓ | ✓ | ✓ |
| Tau | ✓ | ||
| TecPlot | ✓ | ||
| TotalView | ✓ | ✓ | ✓ |
| UDUNITS | ✓ | ||
| VASP | ✓ | ||
| Vis5D | ✓ |
RCAC uses the module command as the preferred method for a user to manage the processing environment. With this command, a user may load libraries and paths for using specific applications or compilers. These are organized into packages which may be loaded and unloaded as needed. Please use the module command and do not manually configure your environment, as RCAC staff will frequently make changes to the specifics of various packages. If you use the module command to manage your environment, these changes will not be noticeable.
Below follows a short introduction to the module command. You can see more in the man page for module. Typing module at the command line will give you a brief usage report.
To see what modules are available on this system, use the "module avail" command:
$ module avail
------------------------ /apps/host/modules/versions -------------------------
3.1.6
-------------------- /apps/host/modules3.1.6/modulefiles ---------------------
dot module-cvs module-info modules null use.own
----------------------- /apps/host/modules/modulefiles -----------------------
R/2.6.2
R/2.7.0
amber/10
ansys/11.0
dx/4.4.4
fftw/2.1.5
fftw/3.1.2
fluent/6.3.26
gamess/24.MAR.2007.R3(default)
gaussian/D.01
gaussian/E.01(default)
gaussian03/D.01
gaussian03/E.01(default)
gcc/4.3.0
...
You should note that all modules consist of both a name and a version number. When loading a module, you may use only the name to load the default version, or specify which version you wish to load:
$ module load intel (load default Intel compiler) $ module load intel/9.1.045 (load version 9.1.045 of the Intel compiler)
Note that you will need to load any relevant modules within job submission scripts that use those applications. Loading the module before submitting your job is not sufficient. Also, if you use bash or ksh as your login shell, you will also need to add a line in any submission script to source /etc/profile before invoking "module". Users of csh and tcsh do not need to do this.
...
. /etc/profile
module load intel
...
To unload a module, use the “module unload” command. It will attempt to undo the changes to your environment, made by that module:
$ module unload intel (unload default Intel compiler) $ module unload intel/9.1.045 (unload version 9.1.045 of the Intel compiler)
To see what modules you have currently loaded, use "module list":
$ module list Currently Loaded Modulefiles: 1) intel/9.1.045 $ module unload intel $ module list No Modulefiles Currently Loaded.
Compilers are available on Radon for Fortran 77, Fortran 90, Fortran 95, C, and C++. The compilers can produce general-purpose and architecture-specific optimizations to improve performance. These include loop-level optimizations, inter-procedural analysis and cache optimizations. The compilers support automatic and user-directed parallelization of Fortran, C, and C++ applications for multiprocessing execution. More detailed documentation on each compiler set available on Radon follows.
To use the Intel compiler set (compilers and associated libraries) on Radon, load the "intel" module, using the "module" command.
Here are some examples for the Intel compilers:
| Language | Serial Program | MPI Program | OpenMP Program |
|---|---|---|---|
| Fortran77 |
$ module load intel $ ifort myprogram.f -o myprogram |
$ module load mpich2-intel $ mpif77 myprogram.f -o myprogram |
$ module load intel $ ifort -openmp myprogram.f -o myprogram |
| Fortran90 |
$ module load intel $ ifort myprogram.f90 -o myprogram |
$ module load mpich2-intel $ mpif90 myprogram.f90 -o myprogram |
$ module load intel $ ifort -openmp myprogram.f90 -o myprogram |
| Fortran95 | (not available) | (not available) | (not available) |
| C |
$ module load intel $ icc myprogram.c -o myprogram |
$ module load mpich2-intel $ mpicc myprogram.c -o myprogram |
$ module load intel $ icc -openmp myprogram.c -o myprogram |
| C++ |
$ module load intel $ icc myprogram.cpp -o myprogram |
$ module load mpich2-intel $ mpiCC myprogram.cpp -o myprogram |
$ module load intel $ icc -openmp myprogram.cpp -o myprogram |
Other versions of the Intel compiler and/or libraries may also be available. To see which versions are currently installed, use the command "module avail".
More information on compiler options can be found in the official man pages, which can be accessed if the appropriate module is loaded using the "man" command, or online here:
Here is some more documentation from other sources on the Intel compilers:
The official name of the GNU compilers is 'GNU Compiler Collection' or 'GCC'. To use the GNU compiler set (compilers and associated libraries) on Radon, load the "gcc" module, using the "module" command.
An older version of the GNU compiler will be in your path by default. Do NOT use this version. Instead, load the newer version using the "module" command.
Here are some examples for the GNU compilers:
| Language | Serial Program | MPI Program | OpenMP Program |
|---|---|---|---|
| Fortran77 |
$ module load gcc $ gfortran myprogram.f -o myprogram |
$ module load mpich2-gcc $ mpif77 myprogram.f -o myprogram |
$ module load gcc $ gfortran -fopenmp myprogram.f -o myprogram |
| Fortran90 |
$ module load gcc $ gfortran myprogram.f90 -o myprogram |
$ module load mpich2-gcc $ mpif90 myprogram.f90 -o myprogram |
$ module load gcc $ gfortran -fopenmp myprogram.f90 -o myprogram |
| Fortran95 |
$ module load gcc $ gfortran myprogram.f95 -o myprogram |
(not available) |
$ module load gcc $ gfortran -fopenmp myprogram.f95 -o myprogram |
| C |
$ module load gcc $ gcc myprogram.c -o myprogram |
$ module load mpich2-gcc $ mpicc myprogram.c -o myprogram |
$ module load gcc $ gcc -fopenmp myprogram.c -o myprogram |
| C++ |
$ module load gcc $ g++ myprogram.cpp -o myprogram |
$ module load mpich2-gcc $ mpiCC myprogram.cpp -o myprogram |
$ module load gcc $ g++ -fopenmp myprogram.cpp -o myprogram |
Other versions of the GNU compiler and/or libraries may also be available. To see which versions are currently installed, use the command "module avail".
More information on compiler options can be found in the official man pages, which can be accessed if the appropriate module is loaded using the "man" command, or online here:
Here is some more documentation from other sources on the GCC compilers:
To use the PGI compiler set (compilers and associated libraries) on Radon, load the "pgi" module, using the "module" command.
Here are some examples for the PGI compilers:
| Language | Serial Program | MPI Program | OpenMP Program |
|---|---|---|---|
| Fortran77 |
$ module load pgi $ pgf77 myprogram.f -o myprogram |
$ module load mpich2-pgi $ mpif77 myprogram.f -o myprogram |
$ module load pgi $ pgf77 -mp myprogram.f -o myprogram |
| Fortran90 |
$ module load pgi $ pgf90 myprogram.f90 -o myprogram |
$ module load mpich2-pgi $ mpif90 myprogram.f90 -o myprogram |
$ module load pgi $ pgf90 -mp myprogram.f90 -o myprogram |
| Fortran95 |
$ module load pgi $ pgf95 myprogram.f95 -o myprogram |
(not available) |
$ module load pgi $ pgf95 -mp myprogram.f95 -o myprogram |
| C |
$ module load pgi $ pgcc myprogram.c -o myprogram |
$ module load mpich2-pgi $ mpicc myprogram.c -o myprogram |
$ module load pgi $ pgcc -mp myprogram.c -o myprogram |
| C++ |
$ module load pgi $ pgCC myprogram.cpp -o myprogram |
$ module load mpich2-pgi $ mpiCC myprogram.cpp -o myprogram |
$ module load pgi $ pgCC -mp myprogram.cpp -o myprogram |
Other versions of the PGI compiler and/or libraries may also be available. To see which versions are currently installed, use the command "module avail".
More information on compiler options can be found in the official man pages, which can be accessed with the "man" command after loading the appropriate compiler module.
Here is some more documentation from other sources on the PGI compilers:
There are a number of different compilers and programs installed on the RCAC systems. To access them, use module load <program>. To see the available modules, type module avail. To read more about the "module" command, look here.
There are two methods for submitting jobs to the Radon community cluster. First, you may submit jobs directly to a queue on Radon. These jobs may be serial, message-passing, or shared-memory in nature. You use the Portable Batch System (PBS) to submit jobs to a queue. Secondly, the Radon cluster is a part of BoilerGrid. You may submit serial jobs to BoilerGrid and specifically request that the serial jobs be run on the resources on Radon.
Radon uses PBS version 9.x. The newer versions have a few minor differences from the older versions (before 8.0).
Differences are mainly:
The Portable Batch System (PBS) is a richly featured workload management system providing job scheduling and job management interface on computing resources, including Linux clusters. With PBS, a user requests resources and submits a job to a queue. A description of Radon's queues follows further down.
Note that you should never run big, long, multi-threaded, or CPU-intensive jobs on the front-end host. The front-end hosts are community-owned and running anything but the smallest test-job will slow them down for everyone. Use PBS to submit the job as a job submission file (called a job script in the official manual) or run an interactive PBS job session.
To see a list of queues, type qstat -Q.
There are a total of 2 queues, but 'workq' is the default one to which everyone has access. It has a default wall time limit of 30 minutes, and maximum wall time limit of 336 hours.
No priorities are currently enforced on the ITaP/RCAC Linux clusters. All users have an equal chance at resources (note that this is not to say that all jobs have an equal chance at resources).
The nodes in Radon are partitioned into subclusters. By default, user jobs will be given nodes from the same subcluster (meaning that all machines will share an Ethernet switch and will have the highest possible interconnect bandwidth). If it is necessary to run jobs that span multiple subclusters, or to run a job that uses more than the number of nodes in any one subcluster, please contact rcac-help@purdue.edu.
The command qstat -Q will give you a list of all the queues. If you use qstat -Q -f it gives you more information about the various queues, including number of jobs, users and walltime.
Queues and some information about them:
| Name of queue | Node Count | Default walltime | Max walltime |
| workq | 240/240 | 00:30:00 | 336:00:00 |
| radon_hold | N/A | 00:30:00 | N/A |
Node Count is the number of accessible nodes for the queue. It is given as max/default.
Note that the number of processors that can be used for a certain queue is the total number of processors that are available to that queue. It does not mean that every submitter to that queue can get "Node Count" processors. For example, if a queue has a total of 32 accessable processors and one user has requested 16 processors and another has asked for 12 processors, then another submitter to that queue can only get 4 processors as long as the others are in use. If this user asks for more than 4 processors, his/her job will wait in the queue until enough processors are available.
A job submission file (job script) can contain any of the commands that you would otherwise issue yourself from the command line. You can, for example both compile and run a program and also set any necessary environment values. The results from compiling or runnning your programs can usually be seen after the job has run. They will show up in your directory as the files <script_name>.e<job number> and <script_name>.o<job number>. The first file will contain any errors that were reported (hopefully none), and the second file will give any results that your program may have output to the screen. If the program is supposed to write the results to a file, this will of course still happen. The job number is a number which PBS gives to every job. This will be reported when the job is submitted.
It may take quite a while before the job finishes running. How long will, among other things, depend on the number of nodes you have requested, how large the program is, which queue you are running it in, and how many other people are using the system at the same time.
A job submission file may consist of PBS directives, comments and executable statements. A PBS directive provides a way of specifying job attributes in addition to the command line options. For example:
#PBS -N Job_name #PBS -l select=4:mem=320kb,walltime=10:30 #PBS -m be # step1 arg1 arg2 step2 arg3 arg4
The -N Job-name replaces script_name of the error and output files. The qsub command scans the lines of the script file for directives. An initial line in the script that begins with the characters "#!" or the character ":" will be ignored and scanning will start with the next line. Scanning will continue until the first executable line, that is a line that is not blank, not a directive line, nor a line whose first non-white space character is "#". If directives occur on subsequent lines, they will be ignored.
The remainder of the directive line consists of the options to qsub in the same syntax as they appear on the command line. The option character is to be preceded with the "-" character.
If an option is present in both a directive and on the command line, that option and its argument, if any, will be ignored in the directive. The command line takes precedence.
If an option is present in a directive and not on the command line, that option and its argument, if any, will be processed as if it had occurred on the command line.
How you run a program depends on whether it is a serial program, an OpenMP program, or a MPI program. There is no difference in how to run the program for the various compilers.
Important: You must 'module load' the same compiler (and MPICH2 if needed) that you used for compiling. Note that it is not necessary to load the standard compiler if you have loaded the corresponding compiler with the MPICH2 libraries included.
The command to submit the job submission file is the following:
qsub -q standby -l select=4,walltime=1:00 run_hello
This example submits a job to queue 'standby' and requests 4 nodes. It has a walltime of 1 minute. The job submission file is called run_hello. The names of the queues will be different on the various RCAC systems. You can find a list of their names with the command qstat -q or look at the section 'Queues'.
Some useful options for the qsub command includes (in the list below, note that a chunk is defined as a set of resources that are to be allocated as a unit):
Note that ncpus can not be larger than the number of processors on each node on the machine in question.
Some environment variables can be set. They are then available to PBS. They include:
If you wish to interrupt qsub prior to job start (before you get a command-line prompt), this can be done by typing control-C. It will then query if the user wishes to exit. If the user responds "yes", qsub exits and that job is aborted.
Instead of using a job submission file, qsub also accepts commands from standard input - the keyboard. To use this option, avoid giving a script operand or give the single character "-". When the script is being read from Standard Input, qsub will copy the file to a temporary file. This temporary file is passed to the library interface routine pbs_submit. The temporary file is removed by qsub after pbs_submit returns or upon the receipt of a signal which would cause qsub to terminate.
Once the job has started execution, input to and output from the job pass through qsub. Keyboard-generated interrupts are passed to the job. Entries beginning with the tilde ('~') character and containing special sequences are escaped by qsub. The recognized escape sequences are:
~. Qsub terminates execution. The batch job is
also terminated.
~susp Suspend the qsub program if running under the C shell.
"susp" is the suspend character, usually CNTL-Z.
~asusp Suspend the input half of qsub (terminal to job),
but allow output to continue to be displayed. Only
works under the C shell. "asusp" is the auxiliary
suspend character, usually CNTL-Y.
If no script is provided, the qsub command reads the script from standard input. When the script is being read from Standard Input, qsub will copy the file to a temporary file. This temporary file is passed to the library interface routine pbs_submit. The temporary file is removed by qsub after pbs_submit returns or upon the receipt of a signal which would cause qsub to terminate.
Note: The following warning applies for users of the c-shell, csh. If the job is executed under the csh and a .logout file exists in the home directory in which the job executes, the exit status of the job is that of the .logout script, not the job submission file. This may impact any interjob dependencies. To preserve the job exit status, either remove the .logout file or place the following line as the first line in the .logout file:
set EXITVAL = $status
and the following line as the last executable line in .logout
exit $EXITVAL
Using the command qstat -a will show you the jobs currently running and their ID's.
Example (run on Steele):
$ qstat -a
steele-adm.rcac.purdue.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
77025.steele-ad user123 standby hello -- 1 8 -- 00:05 Q --
115505.steele-a user456 ncn job4 5601 1 1 -- 600:0 R 575:0
...
189479.steele-a user456 standby AR4b -- 5 40 -- 04:00 H --
189481.steele-a user789 standby STDIN 1415 1 1 -- 00:30 R 00:07
189483.steele-a user789 standby STDIN 1758 1 1 -- 00:30 R 00:07
189484.steele-a user456 standby AR4b -- 5 40 -- 04:00 H --
189485.steele-a user456 standby AR4b -- 5 40 -- 04:00 Q --
189486.steele-a user123 tg_workq STDIN -- 1 1 -- 12:00 Q --
189490.steele-a user456 standby job7 26655 1 8 -- 04:00 R 00:06
189491.steele-a user123 standby job11 -- 1 8 -- 04:00 Q --
$
Where 'Q' = Queued, 'R' = Running, and 'H' = Held.
The list can be very long, making it difficult to find your own runs. If that is the case, use the following command to ask for jobs submitted by a specific user:
$ qstat -a -u user123
steele-adm.rcac.purdue.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
182792.steele-a user123 standby job1 28422 1 4 -- 23:00 R 20:19
185841.steele-a user123 standby job2 24445 1 4 -- 23:00 R 20:19
185844.steele-a user123 standby job3 12999 1 4 -- 23:00 R 20:18
185847.steele-a user123 standby job4 13151 1 4 -- 23:00 R 20:18
$
Stopping the job before it finishes.
qdel <job id>
You get the job id from the qstat -a or qstat -a -u [username] command.
To use the PBS queue interactively, you have to use the -I option. The command to submit such a job would then be like this command
qsub -I -q standby -l select=2:ncpus=2
Where the options used means the following:
As mentioned, the -I option must be specified for the job to be interactive. After opening an interactive session, we may run programs in the normal way. For running serial programs, you should only ask for one "chunk". Parallel programs can be run with the preferred number of nodes, which should be specified with -l select=<# nodes> when qsub is started.
Note that ncpus can not be larger than the number of processors on each node on the machine in question.
To open a display when running interactively, start the interactive session with the following command:
qsub -I -q <queue> -l select=< (number of nodes)>:ncpus=< (number of processors)> -v DISPLAY
To end an interactive job, just type exit. If you wish to interrupt qsub prior to job start, this can be done by typing control-C. It will then query if the user wishes to exit. If the user responds "yes", qsub exits and the job is aborted.
To see which nodes your job is using:
cat $PBS_NODEFILE
It is strongly suggested that you only use an interactive session for developmental tasks (such as debugging). Use a PBS job submission file when running the finished program.
A large part of submitting a job involves understanding how to request computing resources. This section contains examples of submitting PBS jobs, both using a batch script and interactively. There will be separate examples for MPI and OpenMP jobs. Note that the sections 'batch' and 'interactive' have some examples which might also be relevant for, say, MPI and OpenMP.
There are two ways to run a serial program under PBS: batch and interactively. For long jobs, batch submission is to be preferred. There is no difference in how you run a Fortran program, a C program or a C++ program, when they have been compiled.
Batch submission
Suppose that we want to run the C program 'hello.c' - where the executable is called 'hello'. Make a script and call it something meaningful, like run_hello. The script should then contain the following:
#!/bin/bash cd $PBS_O_WORKDIR ./hello
Since PBS will always start in your home directory, you should either do a cd $PBS_O_WORKDIR (which returns you to the directory you submitted the script from) or give the full path to the program.
The command to submit the job is the following:
qsub -q standby -l select=1,walltime=1:00 run_hello
Where I am using the queue 'standby' on Steele, 1 node and a walltime of one min. My job submission file is called run_hello. It should be noted that if you want to use the default queue, you do not need to explicitly ask for it.
Submitting this script gives the following result. It will take a while before the job completes:
$ qsub -q standby -l select=1,walltime=1:00 run_hello 91.steele-adm.rcac.purdue.edu $
Doing a 'ls' in your directory will now show two new files:
$ ls hello run_hello hello.c run_hello.e91 hello.out run_hello.o91 $
If everything went well, then the file 'run_hello.e91' will be empty, since it contains any error-messages your program gave while running. The file 'run_hello.o91' contains the output from your program. In this case the output is:
$ cat run_hello.o91 Hello World! $
Interactively
To use the PBS queue interactively, you must first give a command like the one below. Remember to type 'cd $PBS_O_WORKDIR' (or the path to the working directory), since you will have been returned to your home directory upon start of the interactive job.
$ qsub -I -q standby -l select=1 qsub: waiting for job 189639.steele-adm.rcac.purdue.edu to start qsub: job 189639.steele-adm.rcac.purdue.edu ready $
Where we are running in the queue 'standby' on Steele, and asking for 1 node.
We can now run the job the same way a serial job is normally run. Remember, interactive sessions are mostly for testing purposes, and longer jobs should always be submitted using a job submission file.
$ ./hello Hello World! $
The path to MPICH2 or MPICH is probably not setup on your account. Try typing mpicc at the prompt. If you get 'mpicc: command not found', then you need to either use the 'module load' command or add the path to the compilers you use to your setup file (.cshrc, .tcsh, .bash, .login, or .profile). The compilers and MPICH2/MPICH can be found in /opt. They have to be loaded to be able to run a MPI program.
Using 'module load': The easiest (and preferred) way to access the compilers with MPICH2 included (and the ordinary compilers) is to use the module load mpich2-<compiler> (and module load <compiler> for the ordinary compilers), where <compiler> is one of: intel, gcc, or pgi. Use module avail to see all the possibilities.
Example (loading Intel compilers with MPICH2 included):
$ module load mpich2-intel $
To learn how to compile a MPI program, see the section Compiling a MPI program.
Running your program
To run your programs, you will need to use the PBS queue. This can either be done interactively or by submitting a job using a job submission file.
There is no difference in how you run a Fortran program, a C program or a C++ program, when they have been compiled.
Submitting the job to PBS using a script
Let us say that we want to run the C program example 'hello.c'. Make a script and call it something meaningful, like run_hello. The script should then contain something like the following:
#!/bin/bash source /etc/profile module load mpich2-intel mpirun -np 4 -machinefile $PBS_NODEFILE mpi/hello
It is necessary to include the 'source /etc/profile' under bash/ksh, to be able to use the 'module load mpich2-<compiler>'. The <compiler> can be either intel, gcc, or pgi. Last, the program is run with either mpiexec or mpirun. Since PBS always goes to your home directory, you should give the full path to the program - here mpi/hello, or add cd $PBS_O_WORKDIR before running the program (puts you in the directory you were standing in when issuing the qsub command.
To submit a job:
qsub -q workq -l select=4,walltime=1:00 run_program
Where the options used mean the following:
Submitting this script now gives the following result (it will take a while before the job is completed):
user123@radon-fe00:~/mpi$ qsub -q workq -l select=4,walltime=1:00 run_hello 119452.radon-adm.rcac.purdue.edu user123@radon-fe00:~/mpi$
Doing a 'ls' in your directory will now show two new files:
user123@radon-fe00:~/mpi$ ls hello run_hello hello.c run_hello.e119452 hello.out run_hello.o119452 user123@radon-fe00:~/mpi$
If everything went well, then the file 'run_hello.e119452' will be empty, since it contains any error-messages your program gave while running. The file 'run_hello.o119452' contains the output from your program. In this case the output is:
user123@radon-fe00:~/mpi$ less run_hello.o119452 Processor 2 of 4: Hello World! Processor 3 of 4: Hello World! Processor 1 of 4: Hello World! Processor 0 of 4: Hello World! user123@radon-fe00:~/mpi$
Mpiexec is a replacement program for the script mpirun, which is part of the MPICH2 (and MPICH) package. It is used to initialize a parallel job from within a PBS batch or interactive environment. Mpiexec uses the task manager library of PBS to spawn copies of the executable on the nodes in a PBS allocation. There are reasons to use mpiexec rather than a script (mpirun) or an external daemon (mpd):
Running interactively
Example (run on the queue 'workq' on Radon):
user123@radon-fe00:~/mpi$ qsub -I -q workq -l select=2:ncpus=2,walltime=8:00 qsub: waiting for job 119450.radon-adm.rcac.purdue.edu to start qsub: job 119450.radon-adm.rcac.purdue.edu ready user123@radon-b002:~$
Where the options used means the following:
You can also just start up an interactive job without time constraints:
qsub -I -q workq -l select=4
(Where the options used mean we ask for 4 nodes and 1 processor on each node). To end the job, you then type: exit.
Note that running an interactive job without time constraints means that you will keep the nodes allocated for the default time limit for that queue. If this is shorter than the time you need, your job will not finish. If, on the other hand, it is longer than what you need, you are keeping those nodes from other people's usage. Therefore, use this with caution.
Running the above PBS command gives:
user123@radon-fe00:~/mpi$ qsub -I -q workq -l select=4 qsub: waiting for job 119451.radon-adm.rcac.purdue.edu to start qsub: job 119451.radon-adm.rcac.purdue.edu ready user123@radon-b002:~$
We then need to change to the directory where our program is located. To run a program we use mpirun or mpiexec. You can not just start the program with ./program, since it will then just use one task process.
mpirun: To run a program with mpirun, you issue the following command (remember to do 'module load mpich2-<compiler>', source your file which contains the path, or simply give the full path to mpirun, if you haven't added either to your setup). We need to add '-machinefile $PBS_NODEFILE' for running with mpirun on Radon - this is not the case if we use mpiexec:
mpirun -np <number of tasks> -machinefile $PBS_NODEFILE program
Running this on our program 'hello', for 4 tasks results in:
user123@radon-b002:~/mpi$ mpirun -np 4 -machinefile $PBS_NODEFILE hello Processor 2 of 4: Hello World! Processor 1 of 4: Hello World! Processor 3 of 4: Hello World! Processor 0 of 4: Hello World! user123@radon-b002:~/mpi$
Note that the order of the processors is random. This can not be controlled in a parallel program.
mpiexec: To run with mpiexec, use the following command (only give number of tasks if you do not want to use all you asked for when entering the queue):
mpiexec -n <number of tasks> program
It is not necessary to give the -n <number of tasks>, unless you wish to use a different amount than what you asked for when the job was originally started.
user123@radon-b002:~/mpi$ mpiexec hello Processor 2 of 4: Hello World! Processor 0 of 4: Hello World! Processor 3 of 4: Hello World! Processor 1 of 4: Hello World! user123@radon-b002:~/mpi$ mpiexec -n 2 hello Processor 1 of 2: Hello World! Processor 0 of 2: Hello World! user123@radon-b002:~/mpi$
To see which nodes you are using:
cat $PBS_NODEFILE
Notes
Common mistakes
int array[5000]; int subarray[5000/4]; . . . MPI_Scatter(&array,sendcount,MPI_INT,&subarray,recvcount,MPI_INT,\ 0,MPI_COMM_WORLD);
int *array; int *subarray; . . . array = (int *)malloc(array_size*sizeof(int)); subarray = (int *)malloc(subarray_size*sizeof(int)); . . . MPI_Scatter(array,sendcount,MPI_INT,subarray,recvcount,MPI_INT,0,\ MPI_COMM_WORLD);
C/MPI programming examples
Most of the programs below are my answers to the exercises in the online "Introduction to MPI" course at NCSA.
Diagnostic Error messages from MPI
Click here and go to chapter 5 (p. 121) to see what the diagnostic error messages from MPI means.
Extra examples of MPI programs
To see a few other examples of running MPI programs go here.
Condor allows users to run jobs on systems which would otherwise be idle for however long as those systems are not needed by their primary users. Condor is one of several distributed computing systems RCAC makes available. Most RCAC resources, in addition to being available through normal means, are a part of BoilerGrid and can be used via Condor. If a primary user needs a machine, the Condor job is immediately either checkpointed and/or migrated and the resource made available. Thus, shorter jobs will have a better completion rate via Condor than longer jobs; however, even though jobs may have to be restarted elsewhere, BoilerGrid can offer a vast amount of computational resources to users. Not only are nearly all RCAC systems part of BoilerGrid, so also are large numbers of lab machines at the West Lafayette and other Purdue campuses. BoilerGrid is one of the largest Condor pools in the world. Some machines at other institutions are also a part of a larger Condor federation known as DiaGrid and can be used as well. For more information, refer to the BoilerGrid documentation.
####################
#
# Example 1
# Simple condor job description file
#
####################
Executable = hello
Log = hello.log
Queue
Example 2
In this example (from the Condor manual, we queue two copies of the program mathematica. The first copy will run in directory run_1, and the second will run in directory run_2. For both queued copies, stdin will be test.data, stdout will be loop.out, and stderr will be loop.error. There will be two sets of files written, as the files are each written to their own directories. This is a convenient way to organize data if you have a large group of Condor jobs to run. The example file shows program submission of mathematica as a vanilla universe job. This may be necessary if the source and/or object code to program mathematica is not available.
####################
#
# Example 2: demonstrate use of multiple
# directories for data organization.
#
####################
Executable = mathematica
Universe = vanilla
input = test.data
output = loop.out
error = loop.error
Log = loop.log
Initialdir = run_1
Queue
Initialdir = run_2
Queue
Example 3
In this example (also from the Condor manual, the submit description file queues 150 runs of program foo which has been compiled and linked for Silicon Graphics workstations running IRIX 6.5. This job requires Condor to run the program on machines which have greater than 32 megabytes of physical memory, and expresses a preference to run the program on machines with more than 64 megabytes, if such machines are available. It also advises Condor that it will use up to 28 megabytes of memory when running. Each of the 150 runs of the program is given its own process number, starting with process number 0. So, files stdin, stdout, and stderr will refer to in.0, out.0, and err.0 for the first run of the program, in.1, out.1, and err.1 for the second run of the program, and so forth. A log file containing entries about when and where Condor runs, checkpoints, and migrates processes for the 150 queued programs will be written into file foo.log. #################### # # Example 3: Show off some fancy features including # use of pre-defined macros and logging. # #################### Executable = foo Requirements = Memory >= 32 && OpSys == "IRIX65" && Arch =="SGI" Rank = Memory >= 64 Image_Size = 28 Meg Error = err.$(Process) Input = in.$(Process) Output = out.$(Process) Log = foo.log Queue 150
condor_submit file
condor_submit run_hello (my submit description file is called run_hello).
user123@radon:~$ condor_q user123 -- Submitter: radon.rcac.purdue.edu : <128.210.9.35:35407> : radon.rcac.purdue.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 260187.0 user123 8/30 13:59 0+00:00:00 I 0 19.5 hello 1 jobs; 1 idle, 0 running, 0 held user123@radon:~$ condor_prio -p -15 260187.0 user123@radon:~$ condor_q user123 -- Submitter: radon.rcac.purdue.edu : <128.210.9.35:35407> : radon.rcac.purdue.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 260187.0 user123 8/30 13:59 0+00:00:03 R -15 19.5 hello 1 jobs; 0 idle, 1 running, 0 held user123@radon:~$
-bash-3.00$ condor_status -submitters Name Machine Running IdleJobs HeldJobs user4@rcac.purdue. hamlet.rca 0 0 5 user5@r hamlet.rca 792 0 0 user6@rcac.purdue. hamlet.rca 0 0 0 user7@rcac.purdue. hamlet.rca 0 0 1 user8@rcac.purdue.edu hamlet.rca 0 0 228 user5@r lear.rcac. 0 0 1 user9@rcac.purdue.e radon.rcac 0 6 2 user10@rcac.purdue radon.rcac 0 0 1 user7@rcac.purdue radon.rcac 0 1 2 user6@rcac.purdue.e radon.rcac 0 0 2 user5@r radon.rcac 882 0 0 user11@rcac.purdue radon.rcac 0 0 1 user12@rcac.purdue.e radon.rcac 0 0 5 user13@rcac.purdue. radon.rcac 0 186 2 user14@rcac.purdue radon.rcac 1000 0 0 user15@rcac.p steele-fe0 0 220 1 user16@rcac.purdue.ed steele-fe0 0 0 1 user17@rcac.purdue steele-fe0 0 37472 1 tg_user1@rcac. tg-gatekee 0 1 0 tg_user2@rcac.purdue.e tg-login64 0 1 0 RunningJobs IdleJobs HeldJobs user9@rcac.purdue.e 0 6 2 user7@rcac.purdue 0 0 1 user15@rcac.p 0 220 1 tg_user1@rcac. 0 1 0 user12@rcac.purdue. 0 0 5 user7@rcac.purdue 0 1 2 user6@rcac.purdue.e 0 0 2 user11@rcac.purdue.ed 0 0 1 user5@r 1674 0 1 user17@rcac.purdue 0 37472 1 tg_user1@rcac.purdue.e 0 1 0 user6@rcac.purdue. 0 0 0 user16@rcac.purdue 0 0 1 user12@rcac.purdue.e 0 0 5 user13@rcac.purdue. 0 186 2 user14@rcac.purdue 1000 0 0 user10@rcac.purdue. 0 0 1 user8@rcac.purdue.edu 0 0 228 Total 2674 37887 253 -bash-3.00$
user123@radon:~$ condor_status -constraint 'RemoteUser == "user123@rcac.purdue.edu"' Name OpSys Arch State Activity LoadAv Mem ActvtyTime ba-005.rcac.p LINUX INTEL Claimed Busy 1.000 502 0+00:24:44 ba-006.rcac.p LINUX INTEL Claimed Busy 0.990 502 0+00:20:22 ba-007.rcac.p LINUX INTEL Claimed Busy 1.000 502 0+00:23:16 ba-008.rcac.p LINUX INTEL Claimed Busy 1.000 502 0+00:30:20 ...
user123@radon:~$ Submitter: radon.rcac.purdue.edu : <128.210.9.35:35407> : radon.rcac.purdue.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD ... 260076.7 nice-user.user1 8/18 00:05 0+00:21:47 I 0 29.3 startfah.sh -oneun 260076.9 nice-user.user1 8/18 00:05 0+01:40:44 I 0 136.7 startfah.sh -oneun 260185.0 user123 8/30 13:01 0+00:00:00 R 0 19.5 hello ...
user123@radon:~$ condor_rm 260185.0 Job 260185.0 marked for removal user123@radon:~$ condor_q Submitter: radon.rcac.purdue.edu : <128.210.9.35:35407> : radon.rcac.purdue.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD ... 260076.7 nice-user.user1 8/18 00:05 0+00:21:47 I 0 29.3 startfah.sh -oneun 260076.9 nice-user.user1 8/18 00:05 0+01:40:44 I 0 136.7 startfah.sh -oneun ...
condor_compile <compiler> <program>.<extension> -o <program name>
condor_compile gcc hello.c -o hello
Executable = hello Log = hello.log Output = hello.out Queue
user123@radon:~$ condor_submit run_hello Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 260182. user123@radon:~$
000 (260182.000.000) 08/29 16:21:31 Job submitted from host: <128.210.9.35:35407> ... 001 (260182.000.000) 08/29 16:22:42 Job executing on host: <128.211.131.51:32780> ... 005 (260182.000.000) 08/29 16:22:42 Job terminated. (1) Normal termination (return value 13) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 830 - Run Bytes Sent By Job 13490672 - Run Bytes Received By Job 830 - Total Bytes Sent By Job 13490672 - Total Bytes Received By Job ...
Hello World!