Unscheduled Scratch Outage on Carter

March 30, 2016  4:30pm – March 31, 2016  1:45pm
Carter, Scholar

The scratch storage on Carter and Scholar has been returned to normal operations. The rebuild process will be continuing in the background, so we will be watching for any degradation in the storage performance. All queues have been re-activated. This concludes the outage.

Update: March 31, 2016 1:00pm

The storage pool rebuild on the scratch filesystem for Carter and Scholar continues. All indications are positive at this point, but we are waiting for this to complete before releasing it for general use. We will issue another update by 5:00pm.

Update: March 31, 2016 9:56am

The scratch storage issue on Carter and Scholar has been isolated to a bad drive in one of the disk pools. This is being replaced, and if Carter and Scholar are not back to normal operations before then, we will send an update by 1:00pm.

Update: March 30, 2016 11:43pm

Our storage vendor continues to investigate the issue tonight. We will post an update on the status of the scratch storage by 10:00am tomorrow (Thursday).

Original Message:

The scratch filesystem serving Carter and Scholar is currently unavailable.

Both currently running jobs and attempts to access files in scratch will block until the filesystem is back online. Job scheduling on Carter and Scholar has been paused while storage engineers address the issue.

Originally posted: March 30, 2016  4:34pm