Unscheduled Outage on Data Depot

  • February 23, 2016 11:00am - February 24, 2016 6:00pm EST
  • Outages and Maintenance
  • Data Depot

The Depot filesystem checks have all completed cleanly and the Depot has been fully returned to normal operations. All queues on all clusters are scheduling new jobs again. Any existing jobs which had been waiting for Depot access may also resume.

Update: February 24, 2016 3:10pm

We believe the issue causing the Depot to hang has been resolved, and some systems may already be seeing files from within the Depot. However, we are still wrapping up some integrity verification work to be sure things are fully operational. We expect this to be done sooner, but will update again no later than 6:00pm. We are holding jobs in queue until then as a precaution.

Update: February 24, 2016 11:03am

The Depot filesystem appears to be ok, but some of the servers that make it available are currently locking each other in such a way as to prevent it from being available to anyone. We have been working closely with IBM through the night to find a way to clear the deadlock and allow all the servers to start responding to filesystem requests again. A few things have been tried to no avail, and other options are now being attempted. We will issue an update as soon as we find something more, or at least by 3:00pm.

Update: February 23, 2016 9:55pm

Our vendor is still examining detailed diagnostics from the filesystem to see exactly how to best return the filesystem to production. We will issue another update by 10:00am.

Update: February 23, 2016 5:16pm

Our storage engineers and our vendor's engineers are still investigating the issue, but we do not yet have a projection on when we will be able to return to service. We will issue another update by 10:00pm or earlier if service is restored.

Original Message:

The Data Depot filesystem is currently unavailable.

Both currently running jobs and attempts to access files in Data Depot will block until the filesystem is back online.

Originally posted: February 23, 2016 12:36pm EST