Frequently Asked Questions

Some common questions, errors, and problems are categorized below. Click the Expand Topics link in the upper right to see all entries at once. You can also use the search box above to search the user guide for any issues you are seeing.

Link to section 'About Gilbreth' of 'About Gilbreth' About Gilbreth

Can you remove me from the Gilbreth mailing list?

Your subscription in the Gilbreth mailing list is tied to your account on Gilbreth. If you are no longer using your account on Gilbreth, your account can be deleted from the My Accounts page. Hover over the resource you wish to remove yourself from and click the red 'X' button. Your account and mailinglist subscription will be removed overnight. Be sure to make a copy of any data you wish to keep first.

How is Gilbreth different than other Community Clusters?

  • Gilbreth differs from the previous Community Clusters in many significant aspects:
  • Each Gilbreth node contains 2 Nvidia Tesla V100 accelerator cards which can significantly improve performance of compute-intensive workloads.
  • Each Gilbreth front-end contains one Nvidia Tesla V100 accelerator card. This makes GPU code development and testing much simpler.
  • GPU-enabled applications have both non-gpu and gpu-enabled versions installed. Typically, gpu-enabled versions are tagged with gpu in their module name, e.g., lammps/31Mar17_gpu is the GPU-enabled version of LAMMPS, while lammps/31Mar17 is the non-gpu version of LAMMPS.
  • An exception to the above rule is that for licensed softwares like Abaqus, Ansys, and Matlab, a single module contains both non-gpu and gpu-enabled versions.
  • A selection of GPU-enabled application containers from the Nvidia GPU Cloud (NGC) collection is installed.

Do I need to do anything to my firewall to access Gilbreth?

No firewall changes are needed to access Gilbreth. However, to access data through Network Drives (i.e., CIFS, "Z: Drive"), you must be on a Purdue campus network or connected through VPN.

Link to section 'Logging In & Accounts' of 'Logging In & Accounts' Logging In & Accounts

Link to section 'Errors' of 'Errors' Errors

Account creation failed

An email came into rcac-help from the automated account checker that an account creation failed. There are a few scenarios that can cause this. There are a few things to check.

Link to section 'Account not created' of 'Account creation failed' Account not created

First check what resource they were added to and the corresponding role status from the User Search page.

Take the following steps for these scenarios:

Link to section 'No Role' of 'Account creation failed' No Role

This means either our website failed and didn't add the role (rare, but there is a known bug where when a faculty requests Radon/Hathi for themselves it fails) or IAMO rejected the role.

You can try manually adding the role through the tool and see if it rejects it again, or ask IAMO about the status and if the role can be added (see below).

Link to section 'Role Pending' of 'Account creation failed' Role Pending

This means two things: IAMO's overnight process failed or the account was added just past the cutoff for the overnight process, but before the account check run.

In the former scenario, something went wrong on IAMO's side. Usually Ben is on top of things and gets things sorted quickly when he gets in the morning, but if it's afternoon and it's still not there ask IAMO about it.

For the latter scenario, there is a very narrow window when users can be added and trigger a false alarm (something like ~4-5am). It's rare, but it happens from time to time when we have a night owl/early bird faculty (or traveling abroad).

Link to section 'Role Ready' of 'Account creation failed' Role Ready

The are two scenarios here: IAMO's overnight process failed and has already been fixed or the transd is broken on our end.

In the first scenario, there probably isn't anything to do. You can verify their account with ldapsearch -x uid=USERNAME | grep host and see if the have the proper host entry. If they do, they should be able to log in.

In the second scenario, the next step would be to investigate the transd. The transd translates packets from IAMO into accounts on our systems. Log into xenon.rcac and look at /var/log/transd_log. Is there recent activity at the end of log? If the end of the log is stale, something is probably stuck, like a full disk or some such. In this case, assign ticket to systems and ask them to look at it. If it has recent activity, you should be able to grep the log for the username and look for account entries for them. If the transd is running further investigation is probably needed.

Link to section 'Asking IAMO' of 'Account creation failed' Asking IAMO

The Footprints queue for IAMO is ITAP_IDENTITY_MANAGEMENT. Ben Lewis and Scott Morris are familiar with our web app, and should be familiar with seeing this "account failed" emails. If they come back and say the account is expired/graduated/etc contact the faculty separately with this information (see below). Otherwise Ben should be able to push accounts or unjam the logjam.

Link to section 'Login Shell /opt/acmaint-3.10/etc/disable is invalid.' of 'Account creation failed' Login Shell /opt/acmaint-3.10/etc/disable is invalid.

This means the user account is no longer valid, ie, they graduated. Remove the account from the Manage User page, and inform the faculty separately (don't use the FP ticket) that added them that we were unable to create an account for the user. Good to verify with PI about student's graudation status (usually that'll ring some bells with the faculty). They will need to have an R4P filed, and then they can re-add the account once complete. If the faculty thinks the student should be valid, ask IAMO about the status. They may have been very recently added back, or had some other issue.

/usr/bin/xauth: error in locking authority file

Link to section 'Problem' of '/usr/bin/xauth: error in locking authority file' Problem

I receive this message when logging in:

/usr/bin/xauth: error in locking authority file

Link to section 'Solution' of '/usr/bin/xauth: error in locking authority file' Solution

Your home directory disk quota is full. You may check your quota with myquota.

You will need to free up space in your home directory.

My SSH connection hangs

Link to section 'Problem' of 'My SSH connection hangs' Problem

Your console hangs while trying to connect to a RCAC Server.

Link to section 'Solution' of 'My SSH connection hangs' Solution

This can happen due to various reasons. Most common reasons for hanging SSH terminals are:

  • Network: If you are connected over wifi, make sure that your Internet connection is fine.
  • Busy front-end server: When you connect to a cluster, you SSH to one of the front-ends. Due to transient user loads, one or more of the front-ends may become unresponsive for a short while. To avoid this, try reconnecting to the cluster or wait until the server you have connected to has reduced load.
  • File system issue: If a server has issues with one or more of the file systems (home, scratch, or depot) it may freeze your terminal. To avoid this you can connect to another front-end.

If neither of the suggestions above work, please contact rcac-help@purdue.edu specifying the name of the server where your console is hung.

Link to section 'Questions' of 'Questions' Questions

I worked on Gilbreth after I graduated/left Purdue, but can not access it anymore

Link to section 'Problem' of 'I worked on Gilbreth after I graduated/left Purdue, but can not access it anymore' Problem

You have graduated or left Purdue but continue collaboration with your Purdue colleagues. You find that your access to Purdue resources has suddenly stopped and your password is no longer accepted.

Link to section 'Solution' of 'I worked on Gilbreth after I graduated/left Purdue, but can not access it anymore' Solution

Access to all Research Computing resources depends on having a valid Purdue Career Account. Expired Career Accounts are removed twice a year, during Spring and October breaks (more details at the official page). If your Career Account was purged due to expiration, you will not be be able to access the resources.

To provide remote collaborators with valid Purdue credentials, the University provides a special procedure called R4P ("request for privileges") (see details under 'Data/Access' tab). If you need to continue your collaboration with your Purdue PI, the PI will have to work with their departmental Business Office to submit or renew an R4P request on your behalf.

After your R4P is completed and Career Account is restored, please note two additional necessary steps:

  • Access: Restored Career Accounts by default do not have any Research Computing resources enabled for them. Your PI will have to login to the Manage Users tool and explicitly re-enable your access by un-checking and then ticking back checkboxes for desired queues/Unix groups resources.

  • Email: Restored Career Accounts by default do not have their @purdue.edu email service enabled. While this does not preclude you from using Research Computing resources, any email messages (be that generated on the clusters, or any service announcements) would not be delivered - which may cause inconvenience or loss of compute jobs. To avoid this, we recommend setting your restored @purdue.edu email service to "Forward" (to an actual address you read). The easiest way to ensure it is to go through the Account Setup process.

Can I manage my Login Activity in Box?

In Box under your account settings, click the "Security" tab. You can review and remove sessions.

Link to section 'Jobs' of 'Jobs' Jobs

Link to section 'Errors' of 'Errors' Errors

cannot connect to X server / cannot open display

Link to section 'Problem' of 'cannot connect to X server / cannot open display' Problem

You receive the following message after entering a command to bring up a graphical window

cannot connect to X server cannot open display

Link to section 'Solution' of 'cannot connect to X server / cannot open display' Solution

This can happen due to multiple reasons:

  1. Reason: Your SSH client software does not support graphical display by itself (e.g. SecureCRT or PuTTY).
  2. Reason: You did not enable X11 forwarding in your SSH connection.

    • Solution: If you are in a Windows environment, make sure that X11 forwarding is enabled in your connection settings (e.g. in MobaXterm or PuTTY). If you are in a Linux environment, try

      ssh -Y -l username hostname

  3. Reason: If you are trying to open a graphical window within an interactive PBS job, make sure you are using the -X option with qsub after following the previous step(s) for connecting to the front-end. Please see the example in the Interactive Jobs guide.
  4. Reason: If none of the above apply, make sure that you are within quota of your home directory.

bash: command not found

Link to section 'Problem' of 'bash: command not found' Problem

You receive the following message after typing a command

bash: command not found

Link to section 'Solution' of 'bash: command not found' Solution

This means the system doesn't know how to find your command. Typically, you need to load a module to do it.

bash: module command not found

Link to section 'Problem' of 'bash: module command not found' Problem

You receive the following message after typing a command, e.g. module load intel

bash: module command not found

Link to section 'Solution' of 'bash: module command not found' Solution

The system cannot find the module command. You need to source the modules.sh file as below

source /etc/profile.d/modules.sh

or

#!/bin/bash -i

close Firefox / Firefox is already running but not responding

Link to section 'Problem' of 'close Firefox / Firefox is already running but not responding' Problem

You receive the following message after trying to launch Firefox browser inside your graphics desktop:

Close Firefox

Firefox is already running, but not responding.  To open a new window,
you  must first close the existing Firefox process, or restart your system.

Link to section 'Solution' of 'close Firefox / Firefox is already running but not responding' Solution

When Firefox runs, it creates several lock files in the Firefox profile directory (inside ~/.mozilla/firefox/ folder in your home directory). If a newly-started Firefox instance detects the presence of these lock files, it complains.

This error can happen due to multiple reasons:

  1. Reason: You had a single Firefox process running, but it terminated abruptly without a chance to clean its lock files (e.g. the job got terminated, session ended, node crashed or rebooted, etc).
    • Solution: If you are certain you do not have any other Firefox processes running elsewhere, please use the following command in a terminal window to detect and remove the lock files:
      $ unlock-firefox
  2. Reason: You may indeed have another Firefox process (in another Thinlinc or Gateway session on this or other cluster, another front-end or compute node). With many clusters sharing common home directory, a running Firefox instance on one can affect another.
    • Solution: Try finding and closing running Firefox process(es) on other nodes and clusters.
    • Solution: If you must have multiple Firefoxes running simultaneously, you may be able to create separate Firefox profiles and select which one to use for each instance.

Link to section 'Questions' of 'Questions' Questions

How do I know Non-uniform Memory Access (NUMA) layout on Gilbreth?

  • You can learn about processor layout on Gilbreth nodes using the following command:
    gilbreth-a000:~$ lstopo-no-graphics
  • For detailed IO connectivity:
    gilbreth-a000:~$ lstopo-no-graphics --physical --whole-io
  • Please note that NUMA information is useful for advanced MPI/OpenMP/GPU optimizations. For most users, using default NUMA settings in MPI or OpenMP would give you the best performance.

Link to section 'Data' of 'Data' Data

How is my Data Secured on Gilbreth?

Gilbreth is operated in line with policies, standards, and best practices as described within Secure Purdue, and specific to Research Computing Resources.

Security controls for Gilbreth are based on ones defined in NIST cybersecurity standards.

Gilbreth supports research at the L1 fundamental and L2 sensitive levels. Gilbreth is not approved for storing data at the L3 restricted (covered by HIPAA) or L4 Export Controlled (ITAR), or any Controlled Unclassfied Information (CUI).

For resources designed to support research with heightened security requirements, please look for resources within the REED+ Ecosystem.

Link to section 'For additional information' of 'How is my Data Secured on Gilbreth?' For additional information

Log in with your Purdue Career Account.

Can I share data with outside collaborators?

Yes! Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other institutions. See the Globus documentation on how to share data:

Can I access Fortress from Gilbreth?

Yes. While Fortress directories are not directly mounted on Gilbreth for performance and archival protection reasons, they can be accessed fromGilbreth front-ends and nodes using any of the recommended methods of HSI, HTAR or Globus.

Link to section 'Software' of 'Software' Software

Cannot use pip after loading ml-toolkit modules

Link to section 'Question' of 'Cannot use pip after loading ml-toolkit modules' Question

Pip throws an error after loading the machine learning modules. How can I fix it?

Link to section 'Answer' of 'Cannot use pip after loading ml-toolkit modules' Answer

Machine learning modules (tensorflow, pytorch, opencv etc.) include a version of pip that is newer than the one installed with Anaconda. As a result it will throw an error when you try to use it.

$ pip --version
Traceback (most recent call last):
  File "/apps/cent7/anaconda/5.1.0-py36/bin/pip", line 7, in <module>
    from pip import main
ImportError: cannot import name 'main'

The preferred way to use pip with the machine learning modules is to invoke it via Python as shown below.

$ python -m pip --version

How can I get access to Sentaurus software?

Link to section 'Question' of 'How can I get access to Sentaurus software?' Question

How can I get access to Sentaurus tools for micro- and nano-electronics design?

Link to section 'Answer' of 'How can I get access to Sentaurus software?' Answer

Sentaurus software license requires a signed NDA. Please contact Dr. Mark Johnson, Director of ECE Instructional Laboratories to complete the process.

Once the licensing process is complete and you have been added into a cae2 Unix group, you could use Sentaurus on RCAC community clusters by loading the corresponding environment module:

module load sentaurus

Link to section 'About Research Computing' of 'About Research Computing' About Research Computing

Can I get a private server from RCAC?

Link to section 'Question' of 'Can I get a private server from RCAC?' Question

Can I get a private (virtual or physical) server from RCAC?

Link to section 'Answer' of 'Can I get a private server from RCAC?' Answer

Often, researchers may want a private server to run databases, web servers, or other software. RCAC currently does not offer private servers (formerly known as "Firebox").

For use cases like this, we recommend the Jetstream Cloud (http://jetstream-cloud.org/) an NSF-funded science cloud allocated through the XSEDE project. RCAC staff can help get you access to Jetstream to test, or to help write an allocation proposal for larger projects.

Alternatively, you may consider commercial cloud providers such as Amazon Web Services, Azure, or Digital Ocean. These services are very flexible, but do come with a monetary cost.

Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.