Coates User Guide

Get Help
Collapse Topics

    Overview of Coates
        Overview of Coates

    Common Error Messages
        cannot connect to X server
        E233: cannot open display
        How do I chek my job output while it is running
        bash: command not found
        qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu
        bash: module command not found
        /usr/bin/xauth: error in locking authority file
        My SSH connection hangs

    Common Questions
        How can my collaborators outside Purdue get access to Coates?
        How can I get email alerts about my PBS job status?
        How can I get access to Sentaurus software?
        Can I share data with outside collaborators?
        Can I get a private server from RCAC?

    Biography of Clarence L. Coates
        Overview of Clarence L. Coates


path breadcrumb divider Overview of Coates path breadcrumb divider Overview of Coates

Overview of Coates

Coates was a compute cluster operated by ITaP and was a member of Purdue's Community Cluster Program. ITaP installed Coates on July 21, 2009, and at the time it was the largest entirely 10 Gigabit Ethernet (10GigE) academic cluster in the world. Coates consisted of 982 64-bit, 8-core Hewlett-Packard Proliant and 11 64-bit, 16-core Hewlett-Packard Proliant DL585 G5 systems with between 16 GB and 128 GB of memory. All nodes had 10 Gigabit Ethernet interconnects and a 5-year warranty. Coates was decommissioned on September 30, 2014.

Detailed Hardware Specification

Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect Disk
-A 640 Two 2.5 GHz Quad-Core AMD 2380 8 32 GB 10 GigE 500 GB
-B 45 Two 2.5 GHz Quad-Core AMD 2380 8 32 GB 10 GigE 2 TB
-C 264 Two 2.5 GHz Quad-Core AMD 2380 8 16 GB 10 GigE 500 GB
-D 33 Two 2.5 GHz Quad-Core AMD 2380 8 16 GB 10 GigE 2 TB
-E 11 Four 2.5 GHz Quad-Core AMD 8380 16 128 GB 10 GigE 2 TB

Coates nodes ran Red Hat Enterprise Linux 5 (RHEL5) and used Moab Workload Manager 7 and TORQUE Resource Manager 4 as the portable batch system (PBS) for resource and job management. Coates also ran jobs for BoilerGrid whenever processor cores in it would otherwise have been idle.

path breadcrumb divider Common Error Messages path breadcrumb divider cannot connect to X server

Problem

You receive the following message after entering a command to bring up a graphical window

cannot connect to X server

Solution

This can happen due to multiple reasons:

  • Reason 1: Your SSH client software does not support graphical display by itself (e.g. SecureCRT or PuTTY).
    • Solution: Try using a client software like Thinlinc or MobaXTerm as described here.
  • Reason 2: You did not enable X11 forwarding in your SSH connection.

    • Solution: If you are in a Windows environment, make sure that X11 forwarding is enabled in your connection settings (e.g. in MobaXTerm or PuTTY). If you are in a Linux environment, try

      ssh -Y -l username hostname

  • Reason 3: If you are trying to open a graphical window within an interactive job, make sure you are using the -X option with qsub after following the previous step(s) for connecting to the front-end. Please see the example here.
  • Reason 4: If none of the above apply, make sure that you are within quota of your home directory as described here.

path breadcrumb divider Common Error Messages path breadcrumb divider E233: cannot open display

Problem

You receive the following message after entering a command to bring up a graphical window

E233: cannot open display

Solution

This means you did not enable X11 forwarding which supports remote graphical access to applications. Try

ssh -Y -l username hostname

path breadcrumb divider Common Error Messages path breadcrumb divider How do I chek my job output while it is running

Problem

After submitting your job to the cluster, you want to see the output that it generates.

Solution

There are two simple ways to do this:

  • qpeek: Use the tool qpeek to check the job's output. Syntax of the command is:
    qpeek <jobid>
  • Redirect your output to a file: To do this you need to edit the main command in your jobscript as shown below. Please note the redirection command starting with the greater than (>) sign.
    myapplication ...other arguments... > "${PBS_JOBID}.output"
    On any front-end, go to the working directory of the job and scan the output file.
    tail "<jobid>.output"
    Make sure to replace <jobid> with an appropriate jobid.

path breadcrumb divider Common Error Messages path breadcrumb divider bash: command not found

Problem

You receive the following message after typing a command

bash: command not found

Solution

This means the system doesn't know how to find your command. Typically, you need to load a module to do it.

path breadcrumb divider Common Error Messages path breadcrumb divider qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Problem

You receive the following message after attempting to delete a job with the 'qdel' command

qdel: Server could not connect to MOM 12345.rice-adm.rcac.purdue.edu

Solution

This error usually indicates that at least one node running your job has stopped responding or crashed. Please forward the job ID to rcac-help@purdue.edu, and ITaP Research Computing staff can help remove the job from the queue.

path breadcrumb divider Common Error Messages path breadcrumb divider bash: module command not found

Problem

You receive the following message after typing a command, e.g. module load intel

bash: module command not found

Solution

The system cannot find the module command. You need to source the modules.sh file as below

source /etc/profile.d/modules.sh

or

#!/bin/bash -i

path breadcrumb divider Common Error Messages path breadcrumb divider /usr/bin/xauth: error in locking authority file

Problem

I receive this message when logging in:

/usr/bin/xauth: error in locking authority file

Solution

Your home directory disk quota is full. You may check your quota with myquota.

You will need to free up space in your home directory.

path breadcrumb divider Common Error Messages path breadcrumb divider My SSH connection hangs

Problem

Your console hangs while trying to connect to a RCAC Server.

Solution

This can happen due to various reasons. Most common reasons for hanging SSH terminals are:

  • Network: If you are connected over wifi, make sure that your Internet connection is fine.
  • Busy front-end server: When you connect to a cluster, you SSH to one of the front-ends. Due to transient user loads, one or more of the front-ends may become unresponsive for a short while. To avoid this, try reconnecting to the cluster or wait until the server you have connected to has reduced load.
  • File system issue: If a server has issues with one or more of the file systems (home, scratch, or depot) it may freeze your terminal. To avoid this you can connect to another front-end.

If neither of the suggestions above work, please contact rcac-help@purdue.edu specifying the name of the server where your console is hung.

path breadcrumb divider Common Questions path breadcrumb divider How can my collaborators outside Purdue get access to Coates?

How can my collaborators outside Purdue get access to Coates?

Your Departmental Business Office can submit a Request for Privileges (R4P) to provide access to collaborators outside Purdue, including recent graduates. Once the R4P process is complete, you will need to add your outside collaborators to Coates as you would any for any Purdue collaborator.

path breadcrumb divider Common Questions path breadcrumb divider How can I get email alerts about my PBS job status?

Question

How can I be notified when my PBS job was executed and if it completed successfully?

Answer

Submit your job with the following command line arguments

qsub -M email_address -m bea myjobsubmissionfile

Or, include the following in your job submission file.

#PBS -M email_address                                                  
#PBS -m bae                                                                         

The -m option can have the following letters; "a", "b", and "e":

a - mail is sent when the job is aborted by the batch system.
b - mail is sent when the job begins execution.
e - mail is sent when the job terminates.

path breadcrumb divider Common Questions path breadcrumb divider How can I get access to Sentaurus software?

Question

How can I get access to Sentaurus tools for micro- and nano-electronics design?

Answer

Sentaurus software license requires a signed NDA. Please contact Dr. Mark Johnson, Director of ECE Instructional Laboratories to complete the process.

Once the licensing process is complete and you have been added into a cae2 Unix group, you could use Sentaurus on RCAC community clusters by loading the corresponding environment module:

module load sentaurus

path breadcrumb divider Common Questions path breadcrumb divider Can I share data with outside collaborators?

Yes! Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

path breadcrumb divider Common Questions path breadcrumb divider Can I get a private server from RCAC?

Question

Can I get a private (virtual or physical) server from RCAC?

Answer

Often, researchers may want a private server to run databases, web servers, or other software. RCAC currently does not offer private servers (formerly known as "Firebox").

For use cases like this, we recommend the Jetstream Cloud (http://jetstream-cloud.org/) an NSF-funded science cloud allocated through the XSEDE project. RCAC staff can help get you access to Jetstream to test, or to help write an allocation proposal for larger projects.

Alternatively, you may consider commercial cloud providers such as Amazon Web Services, Azure, or Digital Ocean. These services are very flexible, but do come with a monetary cost.

path breadcrumb divider Biography of Clarence L. Coates path breadcrumb divider Overview of Clarence L. Coates

Portrait of Clarence Coates

Clarence L. Coates

Clarence L. "Ben" Coates came to Purdue in 1973 to head the School of Electrical Engineering (now Electrical and Computer Engineering) where, for the next decade, he emphasized computer education and the development of computing facilities. He was a driving force behind the high performance computing and networking plan that led to the creation of the Engineering Computer Network (ECN) serving all of Purdue's engineering schools. He also initiated a degree program in computer engineering at Purdue. He returned to teaching in the computer field full-time in 1983 before retiring in 1988.

A Nebraska native and Navy veteran of World War II, Professor Coates taught electrical engineering and computer science at the universities of Illinois, Kansas and Texas and at Rensselaer Polytechnic Institute before coming to Purdue. He supervised the engineering computer facilities at Texas and started a graduate program in information sciences. At Illinois, he directed the Coordinated Science Laboratory, an interdisciplinary lab focused on computers, information processing and electronics. He was also a research scientist at the General Electric Research Laboratory in New York and held five patents involving waveform recognition devices, circuit gates and accumulators on computer chips. Prof. Coates died in Florida on October 25, 2000, at age 76.

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.