Outages and Maintenance
-
The Bell Gateway began experiencing issues following the Oct 4-6th Bell Maintenance. In particular, gateway applications have been observed to fail upon attempting to connect to the application after launching the job. Our engineers are investigating...
-
The Weber cluster will be unavailable Thursday, October 6, 2022 from 8:00am - 5:00pm EDT for scheduled maintenance. The cluster will return to full production by Thursday, October 6th, 2022 at 5:00pm EDT. During this time, Weber will have the operati...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, October 5, 2022 from 8:00am - 12:00pm EDT for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transf...
-
The Bell cluster will be unavailable for use October 4, 2022 8:00am - October 5, 2022 11:59pm EDT while we perform scheduled maintenance to expand Bell's scratch storage capabilities and make upgrades to Bell's AMD GPU drivers and other system soft...
-
We will unexpectedly have to power down the Geddes resource as part of the Bell maintenance today as engineers work on the cooling systems for the upcoming deployment of the Negishi cluster. The Geddes cluster will be unavailable October 4, 2022 8:0...
-
The Bell cluster continues to experience issues with Hardware. Engineers are currently diagnosing the issues and are working with vendors to schedule and perform repairs as quickly as possible. Job scheduling continues, but you may experience longer...
-
A section of the Bell cluster compute nodes began experiencing issues with power feed and cooling around 2:30pm EDT. Engineers have powered down affected nodes and are working to identify a fix. Some jobs may have ended up terminated or requeued. Jo...
-
Brown will be unavailable Thursday, September 1, 2022 from 8:00am - 5:00pm EDT for scheduled maintenance. During this maintenance, home directories on Brown will be separated from the legacy home filesystem shared amongst several other clusters as a...
-
The Brown cluster began experiencing issues with its scratch filesystem around 11:20am EDT. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed. We will pr...
-
MATH data center cooling outage
The Math building data center began experience issues with its cooling system around 1:40pm EDT. To minimize thermal load on the cooling infrastructure, job scheduling has been paused and all idle compute nodes on Anvil, Bell, Geddes, Gilbreth, and...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, August 3, 2022 from 8:00am - 12:00pm EDT for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transfe...
-
[REVISED] Scheduled Gilbreth Upgrade
Gilbreth will be unavailable due to maintenance on July 20th 8:00am-5:00pm to allow for an expansion to Gilbreth’s resources. In response to the growing demand for hardware which can facilitate GPU-accelerated workloads, Gilbreth is being expanded to...
-
Bell Scratch Degraded Storage (Returned to Service)
Bell Scratch is near capacity and performance is degraded. As of this morning, Bell Scratch was 94% full. This afternoon we paused scheduling as scratch was not responding consistently. We have more drives on order, but with global supply chain issue...
-
Fortress Tape Library Maintenance
The Fortress Archive will be unavailable Thursday, June 23, 2022 from 8:00am - 6:00pm EDT for a full-day maintenance on its SpectraLogic tape library. During this time, the vendor will perform required servicing of the library, including firmware and...
-
The Halstead cluster began experiencing issues with its scratch file system around 8:00am EDT. The problem manifests as various I/O errors or hangs when reading, writing or listing scratch directories. Engineers are currently diagnosing the issue and...
-
Globus services downtime on June 18, 2022
Globus services will be globally unavailable for a period between 9:00 am and 10:00 am US Central Time (10-11 Eastern) on Saturday, June 18th, 2022 due to planned maintenance that includes database upgrades. During the downtime, the following will be...
-
Brown, Hammer and Weber Cluster Maintenance
The Brown, Hammer, and Weber clusters will be unavailable Tuesday, June 14, 2022 from 6:00am - 11:59pm EDT for scheduled maintenance. The clusters will return to full production by Tuesday, June 14th, 2022 at 11:59pm EDT. During this time, Physical F...
-
Scheduling Paused on Brown and Hammer
Beginning around 2:00pm EDT, the ailing cooling systems for Brown and Hammer began experiencing issues. To reduce the thermal load on the systems, scheduling of new jobs has been paused on Brown and Hammer and will not be resumed until after tomorrow...
-
Anvil will be undergoing scheduled maintenance on Thursday, June 2, 2022 from 8:00am - 6:00pm EDT. During this time, Anvil will be unavailable for use as we fine-tune home directories filesystem and deploy updates to the scheduler. Please see detaile...
-
Fortress Archive Monthly Maintenance
The Fortress Archive will be unavailable Wednesday, June 1, 2022 from 8:00am to 12:00pm for scheduled monthly maintenance (first Wednesday of every month). During this time, Fortress will receive normal software and hardware updates. Any transfers w...