Lustre scratch storage system unavailable

February 3 – 4, 2011
Coates, Rossmann

The Lustre storage system that provides scratch storage on the Rossmann and Coates Linux clusters (via /scratch/lustreA) failed at approximately 1:30pm Thursday, February 3. ITaP Storage Engineers are in MATH working on the problem, but we are currently unable to say when the storage system will be returned to service. As a result, PBS job scheduling on Rossmann and Coates has been paused until the storage system is back in operation.

9pm Thursday, 2/3: Diagnostics run by ITaP and vendor Storage Engineers indicated that part of the disk system in one of the servers that make up LustreA suffered a hardware failure. Replacement parts are being express-shipped to Purdue and are expected to arrive Friday morning, 2/4.

11:30am Friday, 2/4: We have been notified that the replacement parts needed to repair the Lustre storage system are expected to arrive at Purdue early Friday afternoon, 2/4.

8pm Friday, 2/4: ITaP and vendor storage engineers are running diagnostics and checking the integrity of the Lustre storage system but are not yet ready to release it for production.

9pm Friday, 2/4: This outage has been resolved. The Rossmann and Coates Linux clusters' Lustre file system was returned to service and PBS job scheduling resumed at 9pm Friday, 2/4.

Originally posted: February 10, 2011

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.