Outages and Maintenance
-
Emergency Chilled Water Maintenance
During early June, 2014, all RCAC systems housed in the Math Sciences building will be unavailable due to an emergency repair to the redundant chilled-water system serving the MATH datacenter. A major chilled water line has developed a leak, and must...
-
During the maintenance scheduled for May 19-20 the Hansen cluster will be upgraded to Red Hat Enterprise Linux, version 6. Only those PBS jobs with walltimes short enough that they will finish prior to the beginning of this maintenance period are bei...
-
The Carter cluster will be unavailable beginning at 8:00am on Monday, May 19, 2014, for scheduled maintenance. The cluster will return to full production by 5:00pm, Wednesday, May 21. During this time, Carter will receive operating system patches, an...
-
The Peregrine1 cluster will be unavailable beginning at 8:00am on Monday, May 19, 2014, for scheduled maintenance. The cluster will return to full production by 5:00pm, Tuesday, May 20. During this time, the cluster's network link to West Lafayette...
-
Scheduling Paused on Hansen and Carter
The scratch filesystem on Hansen and Carter is currently unavailable due to a hardware issue. Attempts to access scratch will block until the filesystem is back online. Job scheduling on Hansen and Carter has been paused while storage engineers addre...
-
In order to repair a hardware issue with the underlying disk storage comprising LustreC, ITaP storage engineers will execute a brief maintenance on the filesystem on Monday morning, April 7, 2014. This issue is currently impacting the filesystem's re...
-
Fortress HPSS Archive Maintenance
The IBM T3584 tape library serving Fortress is scheduled to be down Wednesday, March 19, 2014 from 8AM to 5PM for a hardware upgrade. Additional tape capacity and tape drives will be added to support increased demand. HPSS will remain accessible, ne...
-
During the maintenance scheduled for 3/15/2014-3/16/2014, the Rossmann cluster will be upgraded to Red Hat Enterprise Linux, version 6. Only those PBS jobs with walltimes short enough that they will finish prior to the beginning of this maintenance...
-
UPDATE - As of 7:45pm Sunday, March 16th, 2014, the fileserver maintenance has completed successfully, and cluster systems are back online. All Research Computing systems will be unavailable from 8:00am Saturday, 3/15/2014 through Sunday, 3/16/2014...
-
The Lustre D filesystem, serving the Conte cluster, has become unavailable as of about 8:00 pm Thursday 13 Feb, 2014. System engineers are working to bring the system back to 100% operation. Currently running jobs should be able to continue, but sch...
-
The Hansen, Coates, and Rossmann clusters will be unavailable beginning at 8:00am on Tuesday, January 7, 2014, for scheduled maintenance. The clusters will return to full production by 5:00pm, Wednesday, January 8. During this time, these systems wil...
-
Hansen and WinHPC clusters at reduced capacity
On December 21, 2013, the Hansen and WinHPC clusters will operate at reduced capacity while datacenter power maintenance is performed on a portion of the system. In the days leading up to December 21st, this will appear as potentially increased queue...
-
Lustre D filesystem unavailable
Update - 2:25pm, 12/16/2013 The LustreD scratch filesystem has been returned to service and both the filesystem and scheduler appear to be working properly. Conte has been returned to normal production service as of 2:20pm. Update - 10:30am, 12/16/2...
-
Maintenance completed on LustreD filesystem
UPDATE 6:00 pm 14 Dec 2013 As of 5:45 pm we believe this problem has been corrected and Conte has returned to normal operation. The LustreD filesystem, serving the Conte cluster, is experiencing some issues as of about 4:30 pm Saturday 14 Dec 2013. S...
-
All ITaP Research Computing systems are currently experiencing an issue with accessing network filesystems. A case has been opened with our vendor as ITaP engineers troubleshoot the issue. Cluster users may experience issues accessing files in /home,...
-
Nearly all major clusters operated by ITaP Research Computing are stopped due to issues with their storage systems relating to the power loss on the West Lafayette campus in the wake of the severe weather Sunday night. This includes: Conte, Carter,...
-
The Fortress HPSS Archive is offline due to issues with their storage systems relating to the power loss on the West Lafayette campus in the wake of the severe weather Sunday night. Engineers are investigating the problem now, but until this is reso...
-
Update: 11:00pm, Nov. 12, 2013 ITaP storage engineers have returned the offline hardware to production and LustreC is back in production. Queues on Hansen and Carter have been restarted as of 11:45pm. Update: 5:00pm Following consultation with vendor...
-
Partial scratch96 filesystem outage
In the evening of 10/10/2013, the fileserver providing the "scratch96" filesystem serving some users of the Steele and Radon clusters suffered a permanent failure to its 2nd tier storage. This means that files on scratch96 that are older th...
-
Fortress HPSS Archive Unavailable
Update - 10:15 am Fortress is back in full production. Original Message: As of 8:00am, Thursday, September 19, the Fortress HPSS is temporarily unavailable due to issues with communicating with its tape drives. Storage engineers are working to return...