Skip to main content

Data Management

As science has become more data-intensive and collaborative, research data management is consuming more of researchers' most precious resource: time. In the era of "big data", decisions regarding data are more complex and fraught with larger consequences. See, for instance, the debate about reproducibility in computational science; funding agencies' mandates to make public data accumulated with taxpayers' dollars; or the challenges of staff turn-over for efficient and effective data management.

Purdue Libraries can help. Here is where the Libraries can help provide effective solutions:

For further information, please contact researchdata@purdue.edu or a liaison librarian to your department.

Research Networks

The Purdue Research Data Network is a high-speed network infrastructure designed to facilitate transfer of the large quantities of data produced by and analyzed on Purdue's high-performance computing systems. Based on the Energy Sciences Network (ESNet)'s Science DMZ Model, the research network connects to statewide or national research network infrastructures like iLight and Internet2.

Some facts about the Research Data Network:

  • 100 Gb/second connection to national research networks
  • 160 Gb/second of bandwidth to central Research Data Depot storage
  • 160 Gb/second of bandwidth to each computational system

Labs and instruments with requirements for high-bandwidth connections to research storage or computing resources are eligible to directly peer to the research network. Please contact help to discuss costs and other considerations.

This material is based upon work supported by the National Science Foundation under Grant No. 1827184.

Managing Sensitive and Restricted Data

RCAC provides several data management utilities for research, data, and applications utilizing sensitive and restricted data.  

The Protected Data Filesystem (PDFS) provides research groups with a secure, centralized storage solution for sensitive and restricted data. Designed for accessibility and reliability, PDFS supports collaborative research while ensuring compliance with sponsor requirements.  

The Protected Data Archive system is a long-term, multi-tiered file caching and storage system suitable for storing sensitive and restricted datasets. It utilizes both online disk and robotic tape drives.  

Rossmann is a centrally managed, NIST 800-171-compliant Community Cluster being launched to support compliance with the updated NIH Genomic Data Sharing (GDS) policy. Rossman is designed to meet the latest NIH Security Best Practices for working with controlled-access human genomic data. 

Weber, named in honor of Mary Ellen Weber, is a specialty high-performance computing cluster designed to support research requiring compliance with export control regulations. It provides a secure and powerful environment for data-sensitive applications.  

REED Folders are a managed storage solution built on Box.com for research labs working with regulated data. They provide secure, centralized storage designed for compliance and collaboration. 

REED+ Secured Research is a secure, managed Purdue ecosystem for handling sensitive and regulated research data. It combines compliance, high-performance computing, and flexible storage solutions to meet the strictest cybersecurity standards for unclassified data. 

Data Management Resources

RCAC offers a variety of tools and protocols to support secure and efficient data management for research workflows.  

Globus, a powerful platform for research data management, enables seamless, reliable, and secure file transfers, sharing, and publication of datasets from virtually anywhere. Purdue users can access the Globus web interface via transfer.rcac.purdue.edu, and RCAC has also provided Globus training resources for users seeking guidance.  

Traditional command-line utilities like SCP and Rsync are available for direct file transfer between systems. These tools are commonly included in Linux and macOS environments and can also be used on Windows.  

Server Message Block (SMB) is a file transfer protocol for transferring files between computers and RCAC systems. This method can be used on Windows, Linux, and Mac OS X.  

Secure File Transfer Protocol (SFTP) is a method for transferring files between two machines. This functionality has more features than SCP and can be applied to remote files.  

For long-term data storage, the Fortress Archive system is recommended. File archive tools such as HSI and HTAR can be utilized to compress and transfer data.