Unscheduled Bell outage

  • October 22, 2021 1:20pm - 5:20pm EDT
  • Outages and Maintenance
  • Bell

UPDATE: October 22, 2021 5:16pm

As of 5:15pm, the extraneous processes impacting the scratch filesystem were identified and terminated, and the Bell cluster has been returned to normal service. Job queues have been enabled and job scheduling has been resumed. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

ORIGINAL: October 22, 2021 1:20pm - 5:20pm EDT

The Bell cluster began experiencing issues with high load and sluggish performance on the scratch filesystem around 1:20pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 6pm.

Originally posted: October 22, 2021 1:45pm EDT