Coates
-
RCAC system and data center maintenance
RCAC systems including the DXUL/fortress archival storage system will be unavailable beginning at 8am Tuesday, 3/17, while system and MATH data center maintenance is performed. All systems are expected to be back in service by 6pm Thursday, 3/19, and...
-
RCAC system maintenance scheduled
RCAC systems will be unavailable from 8am-6pm Friday, October 9, for electrical work in the MATH data center and system maintenance. The Coates Linux cluster will not be returned to service until Tuesday afternoon, October 13, so RCAC staff can cond...
-
Coates cluster network problems
Network problems arose following Coates cluster maintenance Tuesday, January 5. ITaP staff are working to resolve these problems, but we are currently unable to say when Coates will be returned to production. Final Update: The problems which arose f...
-
Cooling problems on coates-b, -c, and -e nodes
Coates-b, -c, and -e nodes have been powered down due to a problem with a CDU (cooling distribution unit) that cools those systems. PBS jobs running on those nodes at the time have been requeued for execution after cooling has been restored and the...
-
Coates and Rossmann cluster job scheduling temporarily suspended
Job scheduling on the Coates and Rossmann Linux cluster was disabled from 7:15-10:20pm Saturday, October 30, due to a partial cooling loss in the MATH datacenter.
-
Lustre scratch storage system unavailable
The Lustre storage system that provides scratch storage on the Rossmann and Coates Linux clusters (via /scratch/lustreA) failed at approximately 1:30pm Thursday, February 3. ITaP Storage Engineers are in MATH working on the problem, but we are curre...
-
ITaP research computers to be down during building upgrades
What’s happening? ITaP’s research computing systems will be shut down beginning at 3 a.m. Tuesday, March 29. The Coates and Rossmann cluster supercomputers could be off through 6 p.m. Thursday, March 31. Why? An outage related to an ongoing power and...
-
All RCAC systems unavailable some portion of Tue-Fri, 3/29-4/1
All RCAC systems will be unavailable on Tuesday, March 29th from 3:00am – 6:00pm. The Rossmann, Coates, Radon, and Moffett clusters will remain down through 6:00pm Thursday, March 31st. Update, 9:00am, March 29: Power has been restored to the Math b...
-
Aug. 5-17 research computing system outage FAQ
What’s happening? ITaP’s research computing systems will be shut down beginning at 5 p.m. Friday, Aug 5, including the Rossmann, Coates, Moffett and Radon clusters. The supercomputers are scheduled to be off until Wednesday, Aug. 17. Why? An outage r...
-
MATH Datacenter upgrades, starting Friday, August 5
Beginning at 5:00 pm, Friday, August 5th, the Coates and Rossmann supercomputer clusters will be unavailable due to work to complete a power and cooling upgrade to the Math Sciences building datacenter. We estimate that these clusters will be unavail...
-
This week, ITaP engineers have been troubleshooting issues with the Coates cluster, with the most common symptom being PBS jobs that abort or restart after some period of run time. Late yesterday afternoon, a change was made to the cluster's networki...
-
The LustreA scratch filesystem, used by Rossmann and Coates, suffered an unknown failure sometime in the early morning of November 15, 2011. LustreA was returned to normal operation at about 10:30am. Any jobs on those systems run overnight before t...
-
This morning, the PBS system on Coates developed an issue with the storage holding its internal state.While systems engineers are working on recovering it from backup, any new job submissions will not be possible, nor will you be able to query job st...
-
System Maintenance - Spring Break 2012
During the week of spring break, 2012, the Steele, Coates, and Rossmann clusters will each be down for maintenance for one day to install OS patches and update the PBS batch software to version 11.1. Additionally, the Radon cluster will be unavailabl...
-
Partial outage affecting some Coates queues
Update - 6:45 pm Tuesday, 10 April 2012 ITaP engineers have found and repaired the network issue that was affecting Coates nodes type B, C and E. Job scheduling has been resumed for all queues. If you encounter any problems, please report them to rc...
-
Unscheduled Power outage in Math Datacenter
Update: 10:00pm Tuesday As of 8:30pm Tuesday 21 August 2012, the LustreB filesystem has been returned to full service. Our storage engineers with assistance of the vendor have verified that the system is stable. If you encounter any issues, please co...
-
Scheduled Maintenance - October 2012
UPDATE: 9 October, 2012 The Coates and Rossmann Clusters have both returned to production, and their maintenance is completed, as of 11:30 am, Tuesday 9 October, 2012 The Coates and Rossmann clusters will go down for scheduled maintenance at 8:00 am...
-
Scheduling paused on ITaP research clusters
During scheduled network maintenance on network equipment connecting storage to ITaP clusters, all scheduling will be paused from 4-6pm. Running jobs will continue to execute, and new jobs may be submitted to PBS queues, but no new jobs will start u...
-
Software Stack Changes during Scheduled Maintenance
During the New Years' weekend holiday, all ITaP HPC resources will be unavailable due to a scheduled upgrade of research home directories. While the systems are down they will also receive several updates to the software stack and modules. These upda...
-
Scheduled Maintenance - RCAC home directory upgrades
Update - 7:00pm, 1/4/2013: - All community clusters (Steele, Coates, Rossmann, Hansen, Carter, and Peregrine1) are back in production. Radon is currently not in production, as ITaP engineers are addressing issues encountered during the upgrade. T...