Unscheduled Fortress outage

September 1, 2020  12:30am – September 3, 2020  5:00pm
Fortress

UPDATE: September 3, 2020  4:59pm

As of 4:59pm, the Fortress tape archive has been returned to normal service, and all transfer methods have been re-enabled. We apologize for the disruption of service. Please report any issues to rcac-help@purdue.edu.


UPDATE: September 3, 2020  9:37am

With steady improvement in Fortress disk cache due to paused transfers, work continues on bringing Fortress back to normal operation. All ingress and egress operations via Globus, HSI, HTAR and SFTP remain paused.

We appreciate your patience and will provide another update by 5pm tonight or sooner.


UPDATE: September 2, 2020  4:59pm

The state of Fortress disk cache subsystem has improved significantly, and work continues on bringing it back to normal operations. All ingress and egress operations via Globus, HSI, HTAR and SFTP remain paused.

We appreciate your patience and will provide another update by 10am tomorrow.


UPDATE: September 2, 2020  9:25am

Work continues on Fortress disk cache. All ingress and egress operations via Globus, HSI, HTAR and SFTP continue to be paused.

We will provide another update by 5pm today.


UPDATE: September 1, 2020  4:59pm

Work continues on Fortress disk cache. All ingress and egress operations via Globus, HSI, HTAR and SFTP continue to be paused.

We will provide another update by 10am tomorrow.


ORIGINAL: September 1, 2020  9:14am

The Fortress tape archive began experiencing issues with its disk cache subsystem being full on Tuesday, September 1st, 2020 around 12:30am. The problems manifest themselves as intermittent Error -1, Error -28, and No space left on device error messages in HSI, HTAR and Globus. Both ingestion and staging of data is being affected.

While engineers are working on the mitigation, all HSI, HTAR, SFTP and Globus transfers to and from Fortress are temporarily paused to minimize the load on the disk cache and give it a chance to clean up. The stopped state may manifest itself as an additional Error -111 in some tools.

We will provide an update by 5pm.

Originally posted: September 1, 2020  9:14am