Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Outages

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with its home and scratch directories filesystem around 12:40pm EDT. Problems manifest as hanging new logins and unresponsive established sessions. Engineers are currently diagnosing the issue and are workin...

  • Intermittent Access Failures on Data Depot

    As of Thursday, June 17th, 2021 at 11:00am EDT, users of community clusters may experience intermittent "permission denied" errors while trying to access their files on Data Depot. Errors may come and go, and may appear on both login and c...

  • Unscheduled Fortress outage

    • Last updated:

    The Fortress tape archive began experiencing load-induced issues around 1:00pm EDT. Problems manifest as various errors and timeouts while trying to access Fortress or transfer data. Engineers are currently diagnosing the issue and are working to ide...

  • Unscheduled outage on multiple clusters

    • Last updated:

    Due to problems with cooling system in the MATH datacenter, the CMS, Bell, Brown, Gilbreth, Halstead, WCERES, and WSC Hadoop clusters began experiencing issues around 4:00pm EDT. Multiple front-end, compute and storage services are affected. Engineer...

  • ANSYS Fluent software unavailable on Bell

    • Last updated:

    We have received multiple reports about ANSYS Fluent software on Bell cluster being unavailable. We are currently diagnosing the issue and are working to identify a fix. We will provide an update by 6pm tonight.

  • Unscheduled Workbench outage

    • Last updated:

    The Workbench cluster began experiencing issues with its network uplink around 6:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 10 pm.

  • Gilbreth queue submission problems

    • Last updated:

    We have received multiple user reports that Gilbreth cluster began experiencing issues with job submissions over the weekend. The problem manifests as an "Invalid account or account/partition combination specified" error message from sbatch...

  • Unscheduled Halstead outage

    • Last updated:

    The Halstead cluster began experiencing issues with its scratch filesystem mount around 4:30pm EST. Users may see "Stale file handle" messages or be unable to navigate to their scratch directories. Engineers are currently diagnosing the iss...

  • Unscheduled Data Depot outage

    • Last updated:

    The Data Depot storage server began experiencing issues around 3:00pm EST on Thursday, February 4th, 2021. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all clusters while this issue...

  • Scholar access outage

    • Last updated:

    A large number of Scholar accounts have been accidentally removed during overnight processing. This manifests as "LDAP authorization check failed", or "Incorrect or Invalid username/password" and similar errors when trying to logi...

  • Unscheduled Data Depot outage

    • Last updated:

    The Bell, Brown, Gilbreth, Halstead, Rice, Scholar, and Snyder clusters began experiencing issues with their Data Depot mounts around 10:00pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. To avoid job losses for...

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with its scratch filesystem around 4:00pm EST. Engineers are currently diagnosing the issue and have opened a ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is bein...

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with its scratch filesystem around 5:00am EST. Engineers are currently diagnosing the issue and have opened the ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is be...

  • Unscheduled Bell outage

    • Last updated:

    The Bell cluster began experiencing issues with metadata on its scratch filesystem around 9:00pm. The problem manifests itself as ls -l command hangs indefinitely, while the plain regular ls (or \ls, or stat FILE) appear to be working. Engineers are...

  • Access to RCAC Resources During ITaP Central Authentication Outage

    On Sunday, December 27th, 2020, ITaP staff will perform major upgrades to the central authentication infrastructure. All applications that require logging in with BoilerKey or Career Account credentials will be unavailable Sunday, December 27, 2020 f...

  • Unscheduled Brown outage

    • Last updated:

    The Brown cluster began experiencing issues with its job scheduler around 4:00pm EST. The problem manifests itself as Slurm-related commands (slist, squeue, sinteractive, sbatch, etc) being slow, unresponsive or timing out. Queue selection dialogs in...

  • Unscheduled RCAC GitHub outage

    • Last updated:

    The github.rcac server will be briefly unavailable Friday, October 16, 2020 from 7:00pm – 11:59pm for an emergency maintenance. During this time, the server will undergo maintenance tasks that can not be completed with the server in production. Opera...

  • Halstead Scratch Issues

    • Last updated:

    The Halstead cluster began experiencing issues with its scratch filesystem around 1:15 pm, Sunday 11 Oct. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being address...

  • Unscheduled Halstead outage

    • Last updated:

    The Halstead cluster began experiencing issues with its scratch filesystem around 9:00pm. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will prov...

  • Halstead Scratch Issues

    • Last updated:

    Halstead's scratch began experiencing issues this morning (Sunday 27 Sep). Job scheduling has been paused while engineers and the system vendor investigate the issue. We will have an update by tomorrow morning (Monday 28 Sep) at 10:00 am.