Outages

Sort By:

Featured Newest to Oldest Oldest to Newest Recently Published

Unscheduled Data Depot and community clusters outage
- August 18, 2021 9:30am - 5:30pm EDT
At about 9:30am EDT, Data Depot servers started experiencing a ramping high load. Coupled with an ongoing scaling issues with the metadata subsystem, this caused Data Depot to become increasingly unresponsive for both community clusters and network d...
Unscheduled Data Depot outage on multiple clusters
- August 13, 2021 12:30am - 9:20am EDT
The Bell, Brown, Gilbreth, Halstead, Scholar, and Workbench clusters began experiencing issues with mounting old Data Depot filesystem around 12:30am EDT. Multiple nodes are flagged offline by an automatic check, and bioinformatics application suite...
Unscheduled Brown and Hammer outage
- August 2, 2021 5:40pm - 7:40pm EDT
The Brown and Hammer cluster began experiencing issues with cooling in the POD data center around 5:40pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being add...
Unscheduled Brown, Hammer and Weber outage
- August 2, 2021 11:00am - 12:00pm EDT
The Brown, Hammer, and Weber clusters began experiencing issues with cooling in the POD data center around 11:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is...
Unscheduled Brown outage
- July 26, 2021 9:00pm - 11:05pm EDT
The Brown cluster began experiencing issues with cooling around 9:00pm EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will provide an update...
Scheduling Paused on Multiple Clusters
- July 21, 2021 4:00pm - 6:00pm EDT
At about 4:00 pm today (Wednesday, 21 July, 2021) System Engineers found an issue with the schedulers on the Bell, Brown, Gilbreth, Halstead, and Scholar clusters. Job scheduling has been paused while this is being investigated. Symptoms of this pro...
Unscheduled Gilbreth outage
- July 1, 2021 5:00pm - July 2, 2021 5:00pm EDT
The Gilbreth cluster began experiencing issues with its scratch file system around 5:00pm EDT on Thursday, July 1st, 2021. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue...
Unscheduled Bell outage
- June 24, 2021 12:40pm - 3:00pm EDT Last updated: June 24, 2021 1:44pm EDT
The Bell cluster began experiencing issues with its home and scratch directories filesystem around 12:40pm EDT. Problems manifest as hanging new logins and unresponsive established sessions. Engineers are currently diagnosing the issue and are workin...
Intermittent Access Failures on Data Depot
- June 17, 2021 11:00am - June 18, 2021 10:45am EDT
As of Thursday, June 17th, 2021 at 11:00am EDT, users of community clusters may experience intermittent "permission denied" errors while trying to access their files on Data Depot. Errors may come and go, and may appear on both login and c...
Unscheduled Fortress outage
- May 3, 2021 1:00pm - 6:20pm EDT Last updated: May 3, 2021 6:22pm EDT
The Fortress tape archive began experiencing load-induced issues around 1:00pm EDT. Problems manifest as various errors and timeouts while trying to access Fortress or transfer data. Engineers are currently diagnosing the issue and are working to ide...
Unscheduled outage on multiple clusters
- April 29, 2021 4:00pm - April 30, 2021 1:45pm EDT Last updated: April 30, 2021 1:45pm EDT
Due to problems with cooling system in the MATH datacenter, the CMS, Bell, Brown, Gilbreth, Halstead, WCERES, and WSC Hadoop clusters began experiencing issues around 4:00pm EDT. Multiple front-end, compute and storage services are affected. Engineer...
ANSYS Fluent software unavailable on Bell
- April 5, 2021 10:00am - 12:30pm EDT Last updated: April 5, 2021 12:43pm EDT
We have received multiple reports about ANSYS Fluent software on Bell cluster being unavailable. We are currently diagnosing the issue and are working to identify a fix. We will provide an update by 6pm tonight.
Unscheduled Workbench outage
- March 11, 2021 6:30pm - 9:00pm EST Last updated: March 11, 2021 9:01pm EST
The Workbench cluster began experiencing issues with its network uplink around 6:30pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 10 pm.
Gilbreth queue submission problems
- February 19, 2021 5:00pm - February 22, 2021 2:50pm EST Last updated: February 22, 2021 2:59pm EST
We have received multiple user reports that Gilbreth cluster began experiencing issues with job submissions over the weekend. The problem manifests as an "Invalid account or account/partition combination specified" error message from sbatch...
Unscheduled Halstead outage
- February 6, 2021 4:30pm - 9:30pm EST Last updated: February 6, 2021 9:23pm EST
The Halstead cluster began experiencing issues with its scratch filesystem mount around 4:30pm EST. Users may see "Stale file handle" messages or be unable to navigate to their scratch directories. Engineers are currently diagnosing the iss...
Unscheduled Data Depot outage
- February 4, 2021 3:00pm - 9:00pm EST Last updated: February 4, 2021 8:58pm EST
The Data Depot storage server began experiencing issues around 3:00pm EST on Thursday, February 4th, 2021. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused on all clusters while this issue...
Scholar access outage
- January 27, 2021 9:00am - 1:00pm EST Last updated: January 27, 2021 1:12pm EST
A large number of Scholar accounts have been accidentally removed during overnight processing. This manifests as "LDAP authorization check failed", or "Incorrect or Invalid username/password" and similar errors when trying to logi...
Unscheduled Data Depot outage
- January 25, 2021 10:00pm - 11:50pm EST Last updated: January 25, 2021 11:49pm EST
The Bell, Brown, Gilbreth, Halstead, Rice, Scholar, and Snyder clusters began experiencing issues with their Data Depot mounts around 10:00pm EST. Engineers are currently diagnosing the issue and are working to identify a fix. To avoid job losses for...
Unscheduled Bell outage
- January 22, 2021 4:00pm - January 23, 2021 5:45pm EST Last updated: January 23, 2021 5:52pm EST
The Bell cluster began experiencing issues with its scratch filesystem around 4:00pm EST. Engineers are currently diagnosing the issue and have opened a ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is bein...
Unscheduled Bell outage
- January 21, 2021 5:00am - 2:00pm EST Last updated: January 21, 2021 2:04pm EST
The Bell cluster began experiencing issues with its scratch filesystem around 5:00am EST. Engineers are currently diagnosing the issue and have opened the ticket with the vendor to identify a fix. Job scheduling has been paused while this issue is be...

Results 121-140 of 317

Outages

Follow Us