Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Unscheduled Bell outage

  • Outages
  • Bell

Link to update at January 13, 2021 12:52am EST UPDATE:

As of 12:45am, engineers resolved the Bell scratch issue and the cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

Link to update at January 12, 2021 7:10pm EST UPDATE:

At about 5:45pm on Tuesday, January 12th, 2021, the problem with Bell scratch has returned. Work continues with our storage vendor on troubleshooting and fixing the issue. Job scheduling on Bell has been stopped again as of 6:50pm.

We appreciate your patience and will provide another update by noon tomorrow.

Link to update at January 12, 2021 2:51pm EST UPDATE:

As of 2:30 pm, this issue has been resolved by our engineers working with the storage vendor.

Bell has returned to full operation and job scheduling has resumed.

Link to update at January 12, 2021 10:29am EST UPDATE:

As of 10:00 am, work is still ongoing on this issue. Job scheduling on Bell is still paused.

We will post an update by 6:00 pm today.

Link to update at January 11, 2021 6:09pm EST UPDATE:

As of 6:00 pm, engineers are continuing to work with the storage system vendor to resolve this problem. Job scheduling on Bell is still paused.

We will post an update by 10 am tomorrow (12 January).

Link to update at January 11, 2021 11:42am EST UPDATE:

Engineers are working with the system vendor for Bell scratch to troubleshoot and identify the problem. Scheduling for new jobs is still paused.

We will post an update here by 6:00 pm.

Link to original posting ORIGINAL:

The Bell cluster began experiencing issues with metadata on its scratch filesystem around 9:00pm. The problem manifests itself as ls -l command hangs indefinitely, while the plain regular ls (or \ls, or stat FILE) appear to be working.

Engineers are currently diagnosing the issue and have opened the ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by noon tomorrow.

Originally posted:
Last updated: