Skip to main content

Protected Data Filesystem User Guide

The Protected Data Filesysem (PDFS) is a high-capacity, fast, reliable and secure data storage service designed, configured and operated for the needs of Purdue researchers requiring high-performance computing to work with sensitive and restricted data.

 

Protected Data Filesystem Overview

As with the community clusters, research labs will be able to easily purchase capacity in the PDFSthrough the PDFS Purchase page on this site. For more information, please contact us.

Link to section 'Protected Data Filesystem Features' of 'Protected Data Filesystem Overview' Protected Data Filesystem Features

The Protected Data Filesystem (PDFS) offers research groups in need of centralized data storage unique features and benefits:

  • Available

    To any Purdue research group working with sensitive or restricted data as a purchase in increments of 1 TB at a competitive annual price or you may request a 100 GB trial space free of charge. Participation in the Community Cluster program is not required.

  • Accessible
  • Capable

    The PDFS facilitates joint work on protected datasets across your research group, providing a central place for datasets requiring higher levels of security to meet sponsor requirements. 

     

  • Controllable Access

    Access management is under your direct control. Unix groups can be created for your group and staff can assist you in setting appropriate permissions to allow exactly the access you want and prevent any you do not. Easily manage who has access through a simple web application — the same application used to manage access to Community Cluster queues.

  • Data Retention

    All data kept in the PDFS remains owned by the research group's lead faculty. When researchers or students leave your group, any files left in their home directories may become difficult to recover. Files kept in PDFS remain with the research group, unaffected by turnover, and could head off potentially difficult disputes.

  • Never Purged

    The PDFS is never subject to purging.

  • Reliable

    The PDFS is redundant and protected against hardware failures.

  • Restricted Data

    The PDFS is suitable for sensitive and restricted datasets. Example datasets that have been reviewed and approved include NIH Database of Genotypes and Phenotypes (dbGaP), licensed datasets such as the UK Biobank, and deidentified human genomic data. The PDFS is not approved for export controlled data subject to ITAR, or CUI.

Link to section 'Protected Data Filesystem Hardware Details' of 'Protected Data Filesystem Overview' Protected Data Filesystem Hardware Details

The PDFS uses an enterprise-class Lustre storage solution with an initial total capacity of over 2 PB. This storage is redundant and reliable, and is available today on the Negishi cluster. The PDFS is non-purged space suitable for tasks such as hosting datasets, processing protected data, editing files, developing and building software, and many other uses. The PDFS is built on Data Direct Networks' 400NVX2 storage platform. 

 

Data Security Standards

The Protected Data Filesystem (PDFS) has been reviewed by the Purdue System Security Information Assurance team and found to meets or exceeds the requirements for controlled access to dbGaP data, human genomics data, and similar levels of data protection. 

The PDFS is a shared Linux computing system physically located in the Purdue Research Data Center. Purdue IT facilities are configured with a high-level of physical security. All building access is controlled by the Purdue card office, badged and logged.  All Purdue IT facilities and processes are certified ISO 9000, 27001, and 20000-1.

All data is stored on RAID arrays attached to file servers on a private, non-routable network. protected data sets are only accessible from within Purdue HPC systems. None of those servers are directly accessible from the Internet. All of the community cluster internal networks are isolated from the Internet, on private networks and with login nodes by firewall rules. The research network entry points are further protected with intrusion detection systems. All servers within RCAC are additionally protected by local firewalls.

All data is stored in directories (folders) with Linux file access controls restricting access to owner and group. Group membership is set by the owner. The top-level permissions on these directories are set by the system and unchangeable by individuals. Groups and accounts are reviewed annually by the primary investigator.

All user access to the system is password controlled. All users of the system are bound by Purdue IT Policies and Standards. Remote access to the servers is via encrypted transport (i.e. SSH). No data is exported to non-compliant systems.

Privileged access accounts are approved by the RCAC staff, documented and restricted to the specific staff members responsible for maintaining the cluster. All privileged access is logged. All system components are kept up-to-date with security patches.

 

Link to section 'Specific Example Approved Datasets (as of Summer 2024)' of 'Data Security Standards' Specific Example Approved Datasets (as of Summer 2024)

  • NIH dbGaP
  • UK Biobank
  • Human Genomic Data

 

Link to section 'Cybersecurity Standards and Dataset Approval Process' of 'Data Security Standards' Cybersecurity Standards and Dataset Approval Process

Purdue IT high-performance computing resources are built and operated in line with the  NIST 800-233 "High-Performance Computing Security: Architecture, Threat Analysis, and Security Posture" best practices, approved for data subject to the NIH Genomic Data Sharing Policy, and certified ISO 27001

Data security requirements are driven by the contract and data use/material transfer/data transfer agreement. New data use agreements are reviewed by contract analysts and Purdue System Security (PSS) Information Assurance analysts, and matched to IT resources.

Sponsor-specifc data security requirements must be reviewed by PSS analysts prior to upload into the PDFS.

Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.