Unscheduled multiple clusters and Data Depot outage

UPDATE: October 20, 2021  6:11pm

The engineers have worked with the vendor to successfully identify and isolate the root cause, and have put an appropriate mitigation in place. As of 6:10pm, the load on the Data Depot servers has returned to normal, and the filesystem performance is back to its expected parameters. We will continue monitoring the filesystem to ensure its availability and performance.

We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

UPDATE: October 20, 2021  12:20pm

The load on Data Depot servers has spiked overnight and continues to be very high, with a significant impact on the overall filesystem performance. Engineers are working with the vendor on identifying the problem.

We will provide an update by 6pm.

UPDATE: October 19, 2021  9:49pm

As of 9:49pm, the Data Depot load has stabilized, offending processes terminated and Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, Workbench clusters have been returned to normal service. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.

ORIGINAL: October 19, 2021 4:30pm - October 20, 2021 6:15pm EDT

The Bell, Brown, Gilbreth, Halstead, Hammer, Scholar, Workbench clusters and Data Depot began experiencing issues with intermittent high load on the Data Depot servers around 4:30pm. Engineers are currently diagnosing the issue and are working to identify a fix.

We will provide an update by 8pm.

Originally posted: October 19, 2021 6:15pm EDT