Home Directory User Guide

Link to section 'Home Directories Overview' of 'Home Overview' Home Directories Overview

Your Home Directories for all ITaP research resources is provided by a DDN GS7KX filesystem appliance.

/home is the primary space used to permanently hold files for a given user. This space has a quota which can be monitored at any time by using a myquota command.

Home Directories spaces currently reside on a self-contained GPFS storage system that provides redundant, high-availability disk space and is a central component of ITaP's research systems infrastructure.

ITaP uses network attached storage (NAS) appliances from DDN to provide scale-out Home Directories space to cluster systems. This storage is reliable, backed-up (via snapshots), and is globally available on all ITaP research systems. Your Home Directories is medium-performance, non-purged space suitable for tasks like sharing data, editing files, developing and building software, and many other uses.

Your Home Directories is not designed or intended for use as high-performance working space for running jobs.

Link to section 'File Storage and Transfer for Home Directories' of 'File Storage and Transfer' File Storage and Transfer for Home Directories

Link to section 'Archive and Compression' of 'Archive and Compression' Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. The mostly commonly used options are:

Link to section 'tar' of 'Archive and Compression' tar

See the official documentation for tar for more information.

Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.

Examples:


  (list contents of archive somefile.tar)
$ tar tvf somefile.tar

  (extract contents of somefile.tar)
$ tar xvf somefile.tar

  (extract contents of gzipped archive somefile.tar.gz)
$ tar xzvf somefile.tar.gz

  (extract contents of bzip2 archive somefile.tar.bz2)
$ tar xjvf somefile.tar.bz2

  (archive all ".c" files in current directory into one archive file)
$ tar cvf somefile.tar *.c

  (archive and gzip-compress all files in a directory into one archive file)
$ tar czvf somefile.tar.gz somedirectory/

  (archive and bzip2-compress all files in a directory into one archive file)
$ tar cjvf somefile.tar.bz2 somedirectory/

Other arguments for tar can be explored by using the man tar command.

Link to section 'gzip' of 'Archive and Compression' gzip

  (more information)

The standard compression system for all GNU software.

Examples:


  (compress file somefile - also removes uncompressed file)
$ gzip somefile

  (uncompress file somefile.gz - also removes compressed file)
$ gunzip somefile.gz

Link to section 'bzip2' of 'Archive and Compression' bzip2

See the official documentation for bzip for more information.

Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.

Examples:


  (compress file somefile - also removes uncompressed file)
$ bzip2 somefile

  (uncompress file somefile.bz2 - also removes compressed file)
$ bunzip2 somefile.bz2

There are several other, less commonly used, options available as well:

  • zip
  • 7zip
  • xz

Link to section 'Sharing Files from Home Directories' of 'Sharing' Sharing Files from Home Directories

Home Directories supports several methods for file sharing. Use the links below to learn more about these methods.

Link to section 'Sharing Data with Globus' of 'Globus' Sharing Data with Globus

Data on any Research Computing resource can be shared with other users within Purdue or with collaborators at other institutions. Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other institutions.

To share files, login to https://transfer.rcac.purdue.edu, navigate to the endpoint (collection) of your choice, and follow instructions as described in Globus documentation on how to share data:

File Transfer

Home Directories supports several methods for file transfer. Use the links below to learn more about these methods.

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

Command-line usage:

  (to a remote system from local)
$ scp sourcefilename myusername@data.rcac.purdue.edu:somedirectory/destinationfilename

  (from a remote system to local)
$ scp myusername@data.rcac.purdue.edu:somedirectory/sourcefilename destinationfilename

  (recursive directory copy to a remote system from local)
$ scp -r sourcedirectory/ myusername@data.rcac.purdue.edu:somedirectory/

Linux / Solaris / AIX / HP-UX / Unix:

  • The "scp" command-line program should already be installed.

Microsoft Windows:

  • MobaXterm
    Free, full-featured, graphical Windows SSH, SCP, and SFTP client.

Mac OS X:

  • You should have already installed the "scp" command-line program. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service for transferring files virtually anywhere. It works within ITaP's various research storage systems; it connects between ITaP and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and Fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Link to section 'Globus Web:' of 'Globus' Globus Web:

  • Navigate to http://transfer.rcac.purdue.edu
  • Click "Proceed" to log in with your Purdue Career Account.
  • On your first login it will ask to make a connection to a Globus account. If you already have one - sign in to associate with your Career Account. Otherwise, click the link to create a new account.
  • Now you are at the main screen. Click "File Transfer" which will bring you to a two-endpoint interface.
  • You will need to select one endpoint on one side as the source, and a second endpoint on the other as the destination. This can be one of several Purdue endpoints or another University or your personal computer (see Personal Client section below).

The ITaP Research Computing endpoints are as follows. A search for "Purdue" will give you several suggested results you can choose from, or you can give a more specific search.

  • Research Data Depot: "Purdue Research Computing - Data Depot", a search for "Depot" should provide appropriate matches to choose from.
  • Fortress: "Purdue Fortress HPPS Archive", a search for "Fortress" should provide appropriate matches to choose from.

From here, select a file or folder in either side of the two-pane window, and then use the arrows in the top-middle of the interface to instruct Globus to move files from one side to the other. You can transfer files in either direction. You will receive an email once the transfer is completed.

Link to section 'Globus Personal Client setup:' of 'Globus' Globus Personal Client setup:

Globus Connect Personal is a small software tool you can install to make your own computer a Globus endpoint on its own. It is useful if you need to transfer files via Globus to and from your computer directly.

  • On the endpoint page from earlier, click "Get Globus Connect Personal" or download a version for your operating system it from here: Globus Connect Personal
  • Name this particular personal system and follow the setup prompts to create your Globus Connect Personal endpoint.
  • Your personal system is now available as an endpoint within the Globus transfer interface.

Link to section 'Globus Command Line:' of 'Globus' Globus Command Line:

Globus supports command line interface, allowing advanced automation of your transfers.

To use the recommended standalone Globus CLI application (the globus command):

Link to section 'Sharing Data with Outside Collaborators' of 'Globus' Sharing Data with Outside Collaborators

Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

For links to more information, please see Globus Support page.

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between ITaP research systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and Fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Note: to access Home Directories through SMB file sharing, you must be on a Purdue campus network or connected through VPN.

Link to section 'Windows:' of 'Windows Network Drive / SMB' Windows:

  • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
  • Windows 8 & 10: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
  • In the folder location enter the following information and click Finish:

    • To access your home directory, enter \\home.rcac.purdue.edu\myusername.
    • To access your scratch space on Home Directories, enter \\scratch.${resource.frontend}.rcac.purdue.edu\${resource.frontend}. Once mapped, you will be able to navigate to your scratch directory.

  • Your home or scratch directory should now be mounted as a drive in the Computer window.

Link to section 'Mac OS X:' of 'Windows Network Drive / SMB' Mac OS X:

  • In the Finder, click Go > Connect to Server
  • In the Server Address enter the following information and click Connect:

    • To access your home directory, enter smb://home.rcac.purdue.edu/myusername.
    • To access your scratch space on Home Directories, enter smb://scratch.${resource.frontend}.rcac.purdue.edu/${resource.frontend}. Once connected, you will be able to navigate to your scratch directory.

Link to section 'Linux:' of 'Windows Network Drive / SMB' Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you would like access via samba on the command line you may install smbclient which will give you FTP-like access and can be used as shown below. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //home.rcac.purdue.edu/myusername -U myusername
    smbclient //scratch.${resource.frontend}.rcac.purdue.edu/${resource.frontend} -U myusername

FTP / SFTP

ITaP does not support FTP on any ITaP research systems because it does not allow for secure transmission of data. Use SFTP instead, as described below.

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

Command-line usage:

$ sftp -B buffersize myusername@data.rcac.purdue.edu

      (to a remote system from local)
sftp> put sourcefile somedir/destinationfile
sftp> put -P sourcefile somedir/

      (from a remote system to local)
sftp> get sourcefile somedir/destinationfile
sftp> get -P sourcefile somedir/

sftp> exit
  • -B: optional, specify buffer size for transfer; larger may increase speed, but costs memory
  • -P: optional, preserve file attributes and permissions

Linux / Solaris / AIX / HP-UX / Unix:

  • The "sftp" command-line program should already be installed.

Microsoft Windows:

  • MobaXterm
    Free, full-featured, graphical Windows SSH, SCP, and SFTP client.

Mac OS X:

  • The "sftp" command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

Lost File Recovery

Home Directories is protected against accidental file deletion through a series of snapshots taken every night just after midnight. Each snapshot provides the state of your files at the time the snapshot was taken. It does so by storing only the files which have changed between snapshots. A file that has not changed between snapshots is only stored once but will appear in every snapshot. This is an efficient method of providing snapshots because the snapshot system does not have to store multiple copies of every file.

These snapshots are kept for a limited time at various intervals. ITaP keeps nightly snapshots for 7 days, weekly snapshots for 4 weeks, and monthly snapshots for 3 months. This means you will find snapshots from the last 7 nights, the last 4 Sundays, and the last 3 first of the months. Files are available going back between two and three months, depending on how long ago the last first of the month was. Snapshots beyond this are not kept.

Only files which have been saved during an overnight snapshot are recoverable. If you lose a file the same day you created it, the file is not recoverable because the snapshot system has not had a chance to save the file.

Snapshots are not a substitute for regular backups. It is the responsibility of the researchers to back up any important data to the Fortress Archive. Home Directories does protect against hardware failures or physical disasters through other means however these other means are also not substitutes for backups.

Home Directories offers several ways for researchers to access snapshots of their files.

flost

If you know when you lost the file, the easiest way is to use the flost command. This tool is available from any ITaP Research Computing resource. If you do not have access to a compute cluster, any Data Depot user may use an SSH client to connect to data.rcac.purdue.edu and run this command.

To run the tool you will need to specify the location where the lost file was with the -w argument:

$ flost -w /depot/mylab

Replace mylab with the name of your lab's Home Directories directory. If you know more specifically where the lost file was you may provide the full path to that directory.

This tool will prompt you for the date on which you lost the file or would like to recover the file from. If the tool finds an appropriate snapshot it will provide instructions on how to search for and recover the file.

If you are not sure what date you lost the file you may try entering different dates into the flost to try to find the file or you may also manually browse the snapshots as described below.

Manual Browsing

You may also search through the snapshots by hand on the Home Directories filesystem if you are not sure what date you lost the file or would like to browse by hand. Snapshots can be browsed from any ITaP Research Computing resource. If you do not have access to a compute cluster, any Home Directories user may use an SSH client to connect to data.rcac.purdue.edu and browse from there. The snapshots are located at /depot/.snapshots on these resources.

You can also mount the snapshot directory over Samba (or SMB, CIFS) on Windows or Mac OS X. Mount (or map) the snapshot directory in the same way as you did for your main Home Directories space substituting the server name and path for \\datadepot-new.rcac.purdue.edu\depot\.winsnaps (Windows) or smb://datadepot-new.rcac.purdue.edu/depot/.winsnaps (Mac OS X). Note: use datadepot.rcac.purdue.edu instead of datadepot-new in the above examples if your Data Depot space has not been migrated to new hardware yet.

Once connected to the snapshot directory through SSH or Samba, you will see something similar to this:

Snapshots folders may look slightly differently when accessed via SSH on data.rcac.purdue.edu or via Samba on datadepot-new.rcac.purdue.edu. Here are examples of both.
SSH to data.rcac.purdue.edu Samba mount on datadepot-new.rcac.purdue.edu
$ cd /depot/.snapshots
$ ls -1
daily_20190129000501
daily_20190130000501
daily_20190131000502
daily_20190201000501
daily_20190202000501
daily_20190203000501
daily_20190204000501
monthly_20181101001501
monthly_20181201001501
monthly_20190101001501
monthly_20190201001501
weekly_20190113002501
weekly_20190120002501
weekly_20190127002501
weekly_20190203002501
Home Directories snapshots via Samba

Each of these directories is a snapshot of the entire Home Directories filesystem at the timestamp encoded into the directory name. The format for this timestamp is year, two digits for month, two digits for day, followed by the time of the day.

You may cd into any of these directories where you will find the entire Home Directories filesystem. Use cd to continue into your lab's Home Directories space and then you may browse the snapshot as normal.

If you are browsing these directories over a Samba network drive you can simply drag and drop the files over into your live Data Depot folder.

Once you find the file you are looking for, use cp to copy the file back into your lab's live Home Directories space. Do not attempt to modify files directly in the snapshot directories.

Windows

If you use Home Directories through "network drives" on Windows you may recover lost files directly from within Windows:

  • Open the folder that contained the lost file.
  • Right click inside the window and select "Properties".
  • Click on the "Previous Versions" tab.
  • A list of snapshots will be displayed.
  • Select the snapshot from which you wish to restore.
  • In the new window, locate the file you wish to restore.
  • Simply drag the file or folder to their correct locations.

In the "Previous Versions" window the list contains two columns. The first column is the timestamp on which the snapshot was taken. The second column is the date on which the selected file or folder was last modified in that snapshot. This may give you some extra clues to which snapshot contains the version of the file you are looking for.

Mac OS X

Mac OS X does not provide any way to access the Home Directories snapshots directly. To access the snapshots there are two options: browse the snapshots by hand through a network drive mount or use an automated command-line based tool.

To browse the snapshots by hand, follow the directions outlined in the Manual Browsing section.

To use the automated command-line tool, log into an ITaP Research Computing cluster or into the host data.rcac.purdue.edu (which is available to all Home Directories users) and use the flost tool. On Mac OS X you can use the built-in SSH terminal application to connect.

  • Open the Applications folder from Finder.
  • Navigate to the Utilities folder.
  • Double click the Terminal application to open it.
  • Type the following command when the terminal opens.
    $ ssh myusername@data.rcac.purdue.edu
    Replace myusername with your Purdue career account username and provide your password when prompted.

Once logged in use the flost tool as described above. The tool will guide you through the process and show you the commands necessary to retrieve your lost file.

Frequently Asked Questions

Some common questions, errors, and problems are categorized below. Click the Expand Topics link in the upper right to see all entries at once. You can also use the search box above to search the user guide for any issues you are seeing.

Link to section 'About Home Directories' of 'About Home Directories' About Home Directories

Do I need to do anything to my firewall to access Home Directories?

No firewall changes are needed to access Home Directories. However, to access data through Network Drives (i.e., CIFS, "Z: Drive"), you must be on a Purdue campus network or connected through VPN.

Link to section 'Data' of 'Data' Data

Can I store Export-controlled data on Home Directories?

The Home Directories is not approved for storing data requiring Export control including ITAR, FISMA, DFAR-7012, NIST 800-171. Please contact the Export Control Office to discuss technology control plans and data storage appropriate for export controlled projects.

Can I store HIPAA data on Home Directories?

The Home Directories is not approved for storing data covered by HIPAA. Please contact the HIPAA Compliance Office to discuss HIPAA-compliant data storage.

Can I share data with outside collaborators?

Yes! Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other intstitutions. See the Globus documentation on how to share data:

Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.