Unscheduled scratch outage on Rossmann

August 24, 2015  3:30pm – August 25, 2015  9:00pm

Update: August 25, 2015 9:00 pm

On Monday, August 24, a disk tray in the Rossmann scratch storage system suffered multiple failures and despite great effort by both ITaP storage engineers and the system vendor, this portion of the scratch system is unable to be recovered.

While approximately 90% of the files on the Rossmann scratch system were unaffected by this in any way, any files which were stored on that portion of the scratch storage system cannot be recovered.

If any of your files were affected, you will be contacted directly with a list of the specific files that were unable to be recovered. Attempts to access any files on the affected storage device will result in an error message similar to:

"cannot access path/to/file/name: Input/output error"

This message means that all or a portion of the file was on the offline storage device and could not be recovered. You should remove any such file using "rm -f filename".

ITaP engineers have removed the failed component from the system, and the scratch storage for Rossmann is in good working order and has been returned to service. Job scheduling on Rossmann has been re-enabled, and the cluster has returned to full production.

We regret any inconvenience this may cause you in your research. If you have any questions, please contact us at rcac-help@purdue.edu.

Update: August 25, 2015 12:45 am

Storage engineers continue working on isolating affected hardware and engaging the vendor to develop and implement a procedure for timely and full recovery.

Thank you for your patience as the work progresses. There is currently no ETA for Rossmann scratch return to service. The next status update will most likely be in the morning.

Update: August 24, 2015 10:40 pm

Storage engineers are engaged with vendor support and continue to work on the issue.

Original message

The scratch filesystem serving Rossmann is currently unavailable.

Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online.

Job scheduling on Rossmann has been paused while storage engineers address the issue.

Users of other clusters may experience "hanging" when running certain commands like 'myquota' or attempting to access files from /scratch/rossmann from other clusters' front end nodes.

Originally posted: August 24, 2015  6:12pm

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.