Overview of Data Depot
The Data Depot is a high-capacity, fast, reliable and secure data storage service designed, configured and operated for the needs of Purdue researchers in any field and shareable with both on-campus and off-campus collaborators.
As with the community clusters, research labs will be able to easily purchase capacity in the Data Depot through the Data Depot Purchase page on this site. For more information, please contact us at email@example.com.
Data Depot Features
The Data Depot offers research groups in need of centralized data storage unique features and benefits:
Participation in the Community Cluster program is not required.
Any research groups with storage needs may purchase space in the Data Depot in increments of 1 TB for a competitive annual price. All research groups at Purdue are eligible for a 100 GB Data Depot trial space free of charge.
The Data Depot is available as a Windows or Mac OS X network drive (CIFS/SMB) on from personal and lab computers on campus, and accessible by SCP/SFTP from anywhere. It is also tied in to Globus, allowing for fast and easy unattended transfer of large amounts of data between local systems or to and from national labs.
For Community Cluster users, your Data Depot space is mounted and accessible from all nodes and clusters as /depot/mylab, making it an ideal place to store shared group applications, tools, scripts, settings, documents, SVN or Git repositories, or even web applications. You may also maintain a standard group-wide set of shell startup and/or aliases files so all your researchers can enjoy a consistent toolset and environment.
The Data Depot is designed to facilitate joint work on shared files across your research group, avoiding the need for numerous copies of critical datasets or complex permissions settings in individuals' home directories.
The Data Depot offers highly configurable access controls. ITaP will create Unix groups for all your researchers and assist you in setting appropriate permissions on files and directories to allow exactly the access you want and prevent any you do not.
Data Depot access management is under your direct control at all times. You or your designees are able to easily manage who has access by adding and removing researchers through a simple web application—the same application used to manage access to Community Cluster queues: Research Computing User Management
All data kept in the Data Depot remains owned by the research group's lead faculty. Researchers and students frequently come and go from research groups, and when they leave, any files left in their personal home directories may become orphaned and difficult or impossible to recover. Unlike home directories, files kept in Data Depot remain with the research group, unaffected by turnover, and could head off potentially difficult disputes.
The Data Depot is never subject to purging.
The Data Depot is redundant and protected against hardware failures and accidental deletion. All data is also mirrored at two different sites on campus to protect against physical disasters and provide far greater reliability and recoverability than existing research data storage methods on campus.
The Data Depot is not approved for regulated data, including HIPAA, ePHI, FISMA, or ITAR data.
The Data Depot is, however, suitable for non-HIPAA human subjects data. Contact ITaP Research Computing for a data security statement for your IRB documentation.
Detailed Hardware Specification
The Data Depot uses an enterprise-class GPFS storage solution with an initial total capacity of over 2 PB. This storage is redundant and reliable, features regular snapshots, and is globally available on all ITaP research systems. The Data Depot is non-purged space suitable for tasks such as sharing data, editing files, developing and building software, and many other uses. Built on Data Direct Networks' SFA12k storage platform, the Data Depot has redundant storage arrays in multiple campus datacenters for maximum availability.
While the Data Depot will scale well for most uses, ITaP continues to recommend using each cluster's parallel scratch filesystem for use as high-performance working space (scratch) for running jobs.
Data Depot Policies
The following ITaP research policies are particularly related to Data Depot: