Data Depot Migration FAQ
The following are highlights and answers to frequently asked questions about the Data Depot Hardware Replacement and Migration.
My Data Depot space is so slow after the migration!
UPDATE: August 18, 2021: In the past few weeks we have received a number of reports about decreased performance, slow access and frequent disconnects when working with migrated Data Depot spaces (both on the clusters and via mapped network drives on personal computers). We would like to provide an update and status report on the issue.
The root cause for this problem has been identified as a subpar performance of the metadata subsystem in the new Depot, causing many metadata lookup operations to time-out or retry. Per the vendor recommendation, an order for additional SSDs for the metadata subsystem was placed more than a month ago. Unfortunately, the global semiconductor shortage has gotten in the way, and the vendor is currently unable to procure the necessary drives and unable to fulfill the order. We are not alone in this sad state, with several large projects in peer organizations waiting on their vendors for flash storage tiers. We are in active discussion with the vendor for alternative procurement sources and options.
In addition, the vendor is actively troubleshooting the frequent disconnects our network drive users had reported. We believe we are approaching a solution to this issue (within the confines of the overarching metadata issue).
While this may not be suitable for every workflow, we highly recommend Globus for large scale data transfers to/from Data Depot with added reliability and resilience. Please see the Globus section of the user guide for more details on Globus transfers. Cluster users may additionally benefit from shifting bulk of their intermediate processing from Data Depot to much more performant cluster scratch filesystems (and copying final results back to Depot at the end of processing). Please reach out to us if you would like to discuss your lab data workflows and brainstorm possible enhancements to them.
We greatly appreciate your patience and understanding during this transition period. Migration of 2.5 PB while live is not an easy feat. Things will get better, but we still have a bit of rough seas in front of us at the moment. Please contact email@example.com if you have any questions or concerns.
UPDATE: July 6, 2021: The background scan task that was severely affecting Data Depot performance during the past couple weeks has completed successfully over the weekend of July 4th. Data Depot responsiveness and performance are back to normal now. Please let us know if you still see any abnormalities.
ORIGINAL: June 26, 2021: As of June 21 we have received a number of reports about decreased performance and slow access to migrated Data Depot spaces. This is a known issue caused by a high load from a background filesystem scanning task. The task must be let running to completion as part of the hardware migration process to re-confirm integrity and correctness of all transfers.
The scanning task has been running since June 11. We do not have an exact estimate on when it finishes, but we anticipate it to be over within the next few days to a week or so. We appreciate your patience and understanding during this transition period. Please reach out to us if you have any questions or concerns.
How do I know if my Data Depot space has been migrated?
- Majority of spaces have been migrated during May 11-12 maintenance.
- All members of each Data Depot space have been sent emails listing migration status of their space(s).
- Check with a handy web-based tool: https://www.rcac.purdue.edu/account/myinfo/
- For the spaces that were not migrated during the main transition downtime, we will work individually with affected research groups to schedule convenient migration time in the coming weeks.
Do I need to do anything?
- If your space has not been migrated yet: no special actions necessary. Continue using Data Depot the same way you have been.
- If your space has been migrated:
- Network drive users will need to disconnect their old network drives and re-map them using new server. See "How do I access my Data Depot space if it has been migrated?" question below.
- All other access methods continue functioning as before (no special actions necessary).
How do I access my Data Depot space if it has been migrated?
- On the clusters: no change. It is the same
- Globus: no change. It is the same
Purdue Research Computing - Data Depotendpoint (and same shared collections off of it).
- SCP and SFTP access: no change. It is the same
- Sharing via WWW: no change. It is the same
- Mac/Windows network drives (SMB/CIFS): small change
- Very similar to before, but substitute
datadepot-new.rcac.purdue.eduas server name.
- The User Guide section has been updated to reflect the changes.
- Very similar to before, but substitute
How do I access my Data Depot space if it has not been migrated?
- No change. Continue operating the very same way you have been doing.
- We will reach out to your lab shortly to coordinate the time for migration.
What happens to my old Data Depot space after it has been migrated?
- After each Data Depot space is copied onto the new hardware platform, the old data is made read-only and locked down on old servers for the duration of the entire transition period.
- Once every space is successfully migrated, the old hardware will be decommissioned.
How long will this Data Depot transition project take?
- Majority of Data Depot spaces have been migrated during the May 11-12 maintenance. For the several spaces that could not be completed during the maintenance time frame, we will work with space owners to schedule their migrations in the following weeks.
- We anticipate to complete all data migrations during summer of 2021. Upon completing all data transfers, we will announce one more scheduled maintenance downtime to finalize all changes (likely in mid-July or early August).