Skip to main content
Have a request for an upcoming news/science story? Submit a Request

Unscheduled Data Depot outage on the clusters

Link to update at April 22, 2020 3:40pm EDT UPDATE:

As of April 22, 2020 3:40pm EDT, Data Depot filesystem on the Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench cluster has been returned to normal service. All the jobs we temporarily held have been released (but not the jobs that were manually held by their owners).

We apologize for the disruption of service and thank you for your patience. Please report any issues to rcac-help@purdue.edu.

Link to update at April 22, 2020 11:54am EDT UPDATE:

Data Depot is now available on front-ends and compute nodes on Gilbreth, Snyder, Scholar, Rice and Workbench. Work continues on bringing it back on the rest of the clusters.

We will provide another update by 4pm or as soon as we have any additional information.

Link to update at April 22, 2020 9:00am EDT UPDATE:

Additional filesystem checks and overnight stability tests on Data Depot were successful. Systems engineers will begin the process of restoring the mount on compute nodes on a per-cluster basis, while continuing to monitor the health of the Data Depot.

We will provide another update by noon.

Link to update at April 21, 2020 9:34pm EDT UPDATE:

Work continues on bringing Data Depot to normal operation.

Please note that the SMB/Windows Network Drive access may currently suffer from intermittent failures.

Job scheduling is enabled on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar and Snyder while Research Data Depot is temporarily unavailable on cluster compute nodes. This entails several unusual consequences which you should be aware of. Please refer to the following article for detailed explanations: Running Jobs on Community Clusters While Data Depot is Unavailable.

We will provide another update by 9am tomorrow or as soon as we have any additional information.

Link to update at April 21, 2020 3:00pm EDT UPDATE:

Work continues on bringing Data Depot to normal operation.

Please note that the SMB/Windows Network Drive access may currently suffer from intermittent failures.

Job scheduling is enabled on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar and Snyder while Research Data Depot is temporarily unavailable on cluster compute nodes. This entails several unusual consequences which you should be aware of. Please refer to the following article for detailed explanations: Running Jobs on Community Clusters While Data Depot is Unavailable.

We will provide another update by 10pm tonight or as soon as we have any additional information.

Link to update at April 21, 2020 9:55am EDT UPDATE:

Data Depot filesystem check has completed. Work continues on bringing it back to normal operation on the clusters.

Job scheduling has been enabled on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar and Snyder while Research Data Depot is temporarily unavailable on cluster compute nodes. This entails several unusual consequences which you should be aware of. Please refer to the following article for detailed explanations: Running Jobs on Community Clusters While Data Depot is Unavailable.

We will provide another update by 4pm or as soon as we have any additional information.

Link to update at April 20, 2020 8:55pm EDT UPDATE:

Data Depot filesystem check progresses per vendor-recommended procedure.

In order to return the clusters to service while this process continues, we will resume job scheduling on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar and Snyder while Research Data Depot is temporarily unavailable on cluster compute nodes. This will entail several unusual consequences which you should be aware of. Please refer to the following article for detailed explanations: Running Jobs on Community Clusters While Data Depot is Unavailable.

We will provide another update by 10 am tomorrow or as soon as we have any additional information.

Link to update at April 20, 2020 4:00pm EDT UPDATE:

Data Depot filesystem check process progresses slowly but steadily, currently at 83.7%. Data Depot remains unavailable on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters. Job scheduling on the clusters remains stopped.

We will provide another update by 10 am tomorrow or as soon as we have any additional information.

Link to update at April 20, 2020 9:58am EDT UPDATE:

Work continues on bringing Data Depot on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters back to normal operation. Job scheduling remains stopped.

For low-impact access to your saved files and documents, you can use the SMB/Windows Network Drive method.

We will provide another update by 4pm or as soon as we have any additional information.

Link to update at April 19, 2020 4:01pm EDT UPDATE:

Data Depot filesystem check process progresses slowly but steadily, currently at 73.5% (close to 3PB out 3.9PB). Data Depot remains unavailable on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters, and job scheduling remains stopped.

For low-impact access to your saved files and documents, you can use the SMB/Windows Network Drive method.

We will provide another update by 10am tomorrow or as soon as we have any additional information.

Link to update at April 19, 2020 9:50am EDT UPDATE:

Data Depot is still down on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters, the filesystem check continues. Job scheduling on the clusters remains stopped.

We will provide another update by 4pm or as soon as we have any additional information.

Link to update at April 18, 2020 4:12pm EDT UPDATE:

Work continues on bringing up Data Depot filesystem on affected clusters. The filesystem check process progresses well, but at a lower rate than initially anticipated.

For low-impact access to your saved files and documents, you can use the SMB/Windows Network Drive method.

We appreciate how critical this service is for users of the clusters and are working around the clock to restore service as soon as possible. We will provide another update by 10am tomorrow or as soon as we have any additional information.

Link to update at April 18, 2020 10:05am EDT UPDATE:

The fix process for Data Depot filesystem on the clusters continues as expected. We will provide another update by 4pm today.

Link to update at April 17, 2020 10:47pm EDT UPDATE:

Work continues on bringing Data Depot on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters back to normal operation. Engineers have identified the source of the problem and are currently working on the fix. This process is expected to continue through the night.

We will provide another update by 10am tomorrow.

Link to update at April 17, 2020 7:57pm EDT UPDATE:

Work continues on diagnosing Data Depot problems on Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench and bringing the clusters back to normal operation. We will provide another update by midnight.

Link to original posting ORIGINAL:

The Brown, Gilbreth, Halstead, Hammer, Rice, Scholar, Snyder, and Workbench clusters began experiencing issues with connection to Data Depot filesystem around 5:00pm EDT on Friday, April 17th, 2020. Engineers are currently diagnosing the issue and are working to identify a fix. Job scheduling has been paused while this issue is being addressed.

We will provide an update by 8pm.

Originally posted:
Last updated: