Outages
-
Campus power outage affecting multiple clusters
Shortly before 9:00AM eastern time, many RCAC clusters experienced a power interruption which interrupted some work due to a campus power outage. UPDATE: Engineers have arrived on campus and found additional impacts from today's power interruption. S...
-
The Negishi Scratch file system began experiencing issues around 11am EST today (December 16). The issue manifests as a "No space left on device" warning on new file creation. Reading and writing existing files is not affected. Engineers ar...
-
Data Depot began experiencing issues with file permissions around 8pm this evening that has since resolved itself. Users will have noticed a "permission denied" error when attempted to access spaces they normally would be able to read based...
-
Unscheduled Storage Permissions Outage
Data Depot and other group-restricted spaces began experiencing issues with file permissions around 5pm. Users will notice a "permission denied" error when attempting to access spaces they normally would be able to read based on their unix...
-
We are currently experiencing network connectivity problems with the Gautschi community cluster. Engineers are investigating and will provide an update by noon 1/18/2025. Update: This has been resolved.
-
The Anvil cluster began experiencing issues with electrical power around 2:30 PM EST. RCAC engineers are working with Purdue electricians to safely restore power. Anvil is operating at reduced capacity while a handful of nodes were shut down as a pre...
-
Update: Tuesday, January 21st, 2025 at 3:02pm EST: The situation has been corrected and job scheduling is running again on Negishi. The Negishi cluster began experiencing issues with electrical power around 2:30 PM. RCAC engineers are working with P...
-
The Fortress storage system began experiencing issues earlier today related to one of its adminstrative servers. This results in access being denied to users attempting to connect/authenticate; e.g., via hsi or htar command-line tools or the Globus t...
-
The Gautschi cluster began experiencing issues with internal fabrics around 02:30 2025-02-13. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will...
-
We have noticed a discrepancy in the allocation usage after the outage, so you may see incorrect usage for your allocation(s) from mybalance. Our engineers are woking on the fix. Job scheduling will NOT be impacted. We will provide an update by 5:00p...
-
Unscheduled Gautschi cluster outage
The Gautschi cluster began experiencing issues with its power feed around 06:45am. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will provide an...
-
At around 11:00am, Bell's scratch filesystem began to show signs of a severe performance degradation. We have paused job scheduling on Bell while engineers investigate the slow down and work to brings things back up to speed. We will provide an upda...
-
Unscheduled Bell, Negishi, and Gilbreth outage
At around 1:30 PM, Bell, Negishi, and Gilbreth began to exhibit exessively high temperatures. Job scheduling has been paused while this issue is being addressed. We will provide an update by 8:00 PM.
-
At around 10:45am EDT, Gilbreth scheduling was paused in order to reduce thermal load while emergency plumbing work is performed on the cooling loop that serves the Gilbreth cluster. Login access to Gilbreth remains available for file access and jobs...
-
Power outage impacting multiple clusters
Due to a campus-wide power outage, the Anvil, Negishi, Rowdy, and Scholar clusters experienced an unscheduled reboot at Friday, April 4th, 2025 at 8:30am EDT. Engineers are currently diagnosing the impacted nodes and bringing services back online. If...
-
The Anvil cluster began experiencing issues with permissions issues of project directories around noon. We are working on the fix. Job scheduling has been paused while this issue is being addressed. Update: the issue has been fixed by 2:20 pm today.
-
RCAC systems are experiencing networking related issues that impact access to some destinations on the Internet. We are actively monitoring the situation and working to resolve the disruptions as quickly as possible. During this time, you may encount...
-
Beginning at 19:00 EST, the Anvil cluster will experience a brief network interruption to fix an issue related to network connectivity. Expected return to service is 19:30 EST.
-
Beginning at 10:00 AM EST on June 10th, the Anvil cluster experience a brief network interruption to fix an issue related to network connectivity. We are continuing to work on it and expected return to service is 11:00 AM EST the end of the day. Upda...
-
Unscheduled Weber Login Interruption
The Weber cluster began experiencing issues with logins around 10:00am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. We will provide an update by 2:00pm EDT.