Skip to main content

File Storage and Transfer

Learn more about file storage transfer for Scholar.

Link to section 'Archive and Compression' of 'Archive and Compression' Archive and Compression


There are several options for archiving and compressing groups of files or directories. The mostly commonly used options are:

 

Link to section 'tar' of 'Archive and Compression' tar

See the official documentation for tar for more information.

Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.

Examples:


  (list contents of archive somefile.tar)
$ tar tvf somefile.tar

  (extract contents of somefile.tar)
$ tar xvf somefile.tar

  (extract contents of gzipped archive somefile.tar.gz)
$ tar xzvf somefile.tar.gz

  (extract contents of bzip2 archive somefile.tar.bz2)
$ tar xjvf somefile.tar.bz2

  (archive all ".c" files in current directory into one archive file)
$ tar cvf somefile.tar *.c

  (archive and gzip-compress all files in a directory into one archive file)
$ tar czvf somefile.tar.gz somedirectory/

  (archive and bzip2-compress all files in a directory into one archive file)
$ tar cjvf somefile.tar.bz2 somedirectory/

Other arguments for tar can be explored by using the man tar command.

Link to section 'gzip' of 'Archive and Compression' gzip

  (more information)

The standard compression system for all GNU software.

Examples:


  (compress file somefile - also removes uncompressed file)
$ gzip somefile

  (uncompress file somefile.gz - also removes compressed file)
$ gunzip somefile.gz

Link to section 'bzip2' of 'Archive and Compression' bzip2

See the official documentation for bzip for more information.

Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.

Examples:


  (compress file somefile - also removes uncompressed file)
$ bzip2 somefile

  (uncompress file somefile.bz2 - also removes compressed file)
$ bunzip2 somefile.bz2

There are several other, less commonly used, options available as well:

  • zip
  • 7zip
  • xz

Link to section 'Environment Variables' of 'Environment Variables' Environment Variables

Several environment variables are automatically defined for you to help you manage your storage. Use environment variables instead of actual paths whenever possible to avoid problems if the specific paths to any of these change.

Some of the environment variables you should have are:
Name Description
HOME /home/myusername
PWD path to your current directory
RCAC_SCRATCH /scratch/scholar/myusername

By convention, environment variable names are all uppercase. You may use them on the command line or in any scripts in place of and in combination with hard-coded values:

$ ls $HOME
...

$ ls $RCAC_SCRATCH/myproject
...

To find the value of any environment variable:

$ echo $RCAC_SCRATCH
/scratch/scholar/myusername 

To list the values of all environment variables:

$ env
USER=myusername
HOME=/home/myusername
RCAC_SCRATCH=/scratch/scholar/myusername 
...

You may create or overwrite an environment variable. To pass (export) the value of a variable in bash:

$ export MYPROJECT=$RCAC_SCRATCH/myproject

To assign a value to an environment variable in either tcsh or csh:

$ setenv MYPROJECT value

Storage Options

File storage options on RCAC systems include long-term storage (home directories, depot, Fortress) and short-term storage (scratch directories, /tmp directory). Each option has different performance and intended uses, and some options vary from system to system as well. Daily snapshots of home directories are provided for a limited time for accidental deletion recovery. Scratch directories and temporary storage are not backed up and old files are regularly purged from scratch and /tmp directories. More details about each storage option appear below.

Home Directory

Home directories are provided for long-term file storage. Each user has one home directory. You should use your home directory for storing important program files, scripts, input data sets, critical results, and frequently used files. You should store infrequently used files on Fortress. Your home directory becomes your current working directory, by default, when you log in.

Your home directory physically resides on a dedicated storage system only accessible for Scholar. To find the path to your home directory, first log in then immediately enter the following:

$ pwd
/home/myusername

Or from any subdirectory:

$ echo $HOME
/home/myusername

Please note that your Scholar home directory and its contents are exclusive to Scholar cluster, including front-end hosts and compute nodes. This home directory is not available on other RCAC machines but Scholar. There is no automatic copying or synchronization between home directories, but at your discretion you can manually copy all or parts of your main home to Scholar using one of the suggested methods.

Your home directory has a quota limiting the total size of files you may store within. For more information, refer to the Storage Quotas / Limits Section.

Link to section 'Lost File Recovery' of 'Home Directory' Lost File Recovery

Nightly snapshots for 7 days, weekly snapshots for 4 weeks, and monthly snapshots for 3 months are kept. This means you will find snapshots from the last 7 nights, the last 4 Sundays, and the last 3 first of the months. Files are available going back between two and three months, depending on how long ago the last first of the month was. Snapshots beyond this are not kept. For additional security, you should store another copy of your files on more permanent storage, such as the Fortress HPSS Archive

Link to section 'Performance' of 'Home Directory' Performance

Your home directory is medium-performance, non-purged space suitable for tasks like sharing data, editing files, developing and building software, and many other uses.

Your home directory is not designed or intended for use as high-performance working space for running data-intensive jobs with heavy I/O demands.

Link to section 'Long-Term Storage' of 'Long-Term Storage' Long-Term Storage

Long-term Storage or Permanent Storage is available to users on the High Performance Storage System (HPSS), an archival storage system, called Fortress. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has over 10PB of capacity.

For more information about Fortress, how it works, and user guides, and how to obtain an account:

Scratch Space

Scratch directories are provided for short-term file storage only. The quota of your scratch directory is much greater than the quota of your home directory. You should use your scratch directory for storing temporary input files which your job reads or for writing temporary output files which you may examine after execution of your job. You should use your home directory and Fortress for longer-term storage or for holding critical results. The hsi and htar commands provide easy-to-use interfaces into the archive and can be used to copy files into the archive interactively or even automatically at the end of your regular job submission scripts.

Files in scratch directories are not recoverable. Files in scratch directories are not backed up. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

Files are purged from scratch directories not accessed or had content modified in 60 days. Owners of these files receive a notice one week before removal via email. Be sure to regularly check your Purdue email account or set up mail forwarding to an email account you do regularly check. For more information, please refer to our Scratch File Purging Policy.

All users may access scratch directories on Scholar. To find the path to your scratch directory:

$ findscratch
/scratch/scholar/myusername

The value of variable $RCAC_SCRATCH is your scratch directory path. Use this variable in any scripts. Your actual scratch directory path may change without warning, but this variable will remain current.

$ echo $RCAC_SCRATCH
/scratch/scholar/myusername

Scratch directories are specific per cluster. I.e. only the /scratch/scholar directory is available on Scholar front-end and compute nodes. No other scratch directories are available on Scholar.

Your scratch directory has a quota capping the total size and number of files you may store in it. For more information, refer to the section Storage Quotas / Limits.

Link to section 'Performance' of 'Scratch Space' Performance

Your scratch directory is located on a high-performance, large-capacity parallel filesystem engineered to provide work-area storage optimized for a wide variety of job types. It is designed to perform well with data-intensive computations, while scaling well to large numbers of simultaneous connections.

/tmp Directory

/tmp directories are provided for short-term file storage only. Each front-end and compute node has a /tmp directory. Your program may write temporary data to the /tmp directory of the compute node on which it is running. That data is available for as long as your program is active. Once your program terminates, that temporary data is no longer available. When used properly, /tmp may provide faster local storage to an active process than any other storage option. You should use your home directory and Fortress for longer-term storage or for holding critical results.

Backups are not performed for the /tmp directory and removes files from /tmp whenever space is low or whenever the system needs a reboot. In the event of a disk crash or file purge, files in /tmp are not recoverable. You should copy any important files to more permanent storage.

Link to section 'Sharing Files from Scholar' of 'Sharing' Sharing Files from Scholar

Scholar supports several methods for file sharing. Use the links below to learn more about these methods.

Link to section 'Sharing Data with Globus' of 'Globus' Sharing Data with Globus

Data on any RCAC resource can be shared with other users within Purdue or with collaborators at other institutions. Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other institutions.

To share files, login to https://transfer.rcac.purdue.edu, navigate to the endpoint (collection) of your choice, and follow instructions as described in Globus documentation on how to share data:

See also RCAC Globus presentation.

File Transfer

Scholar supports several methods for file transfer. Use the links below to learn more about these methods.

SCP

SCP (Secure CoPy) is a simple way of transferring files between two machines that use the SSH protocol. SCP is available as a protocol choice in some graphical file transfer programs and also as a command line program on most Linux, Unix, and Mac OS X systems. SCP can copy single files, but will also recursively copy directory contents if given a directory name.

After Aug 17, 2020, the community clusters will not support password-based authentication for login. Methods that can be used include two-factor authentication (Purdue Login) or SSH keys. If you do not have SSH keys installed, you would need to type your Purdue Login response into the SFTP's "Password" prompt.

Link to section 'Command-line usage:' of 'SCP' Command-line usage:

You can transfer files both to and from Scholar while initiating an SCP session on either some other computer or on Scholar (in other words, directionality of connection and directionality of data flow are independent from each other). The scp command appears somewhat similar to the familiar cp command, with an extra user@host:file syntax to denote files and directories on a remote host. Either Scholar or another computer can be a remote.

  • Example: Initiating SCP session on some other computer (i.e. you are on some other computer, connecting to Scholar):

          (transfer TO Scholar)
          (Individual files) 
    $ scp  sourcefile  myusername@scholar.rcac.purdue.edu:somedir/destinationfile
    $ scp  sourcefile  myusername@scholar.rcac.purdue.edu:somedir/
          (Recursive directory copy)
    $ scp -pr sourcedirectory/  myusername@scholar.rcac.purdue.edu:somedir/
    
          (transfer FROM Scholar)
          (Individual files)
    $ scp  myusername@scholar.rcac.purdue.edu:somedir/sourcefile  destinationfile
    $ scp  myusername@scholar.rcac.purdue.edu:somedir/sourcefile  somedir/
          (Recursive directory copy)
    $ scp -pr myusername@scholar.rcac.purdue.edu:sourcedirectory  somedir/
    

    The -p flag is optional. When used, it will cause the transfer to preserve file attributes and permissions. The -r flag is required for recursive transfers of entire directories.

  • Example: Initiating SCP session on Scholar (i.e. you are on Scholar, connecting to some other computer):

          (transfer TO Scholar)
          (Individual files) 
    $ scp  myusername@$another.computer.example.com:sourcefile  somedir/destinationfile
    $ scp  myusername@$another.computer.example.com:sourcefile  somedir/
          (Recursive directory copy)
    $ scp -pr myusername@$another.computer.example.com:sourcedirectory/  somedir/
    
          (transfer FROM Scholar)
          (Individual files)
    $ scp  somedir/sourcefile  myusername@$another.computer.example.com:destinationfile
    $ scp  somedir/sourcefile  myusername@$another.computer.example.com:somedir/
          (Recursive directory copy)
    $ scp -pr sourcedirectory  myusername@$another.computer.example.com:somedir/
    

    The -p flag is optional. When used, it will cause the transfer to preserve file attributes and permissions. The -r flag is required for recursive transfers of entire directories.

Link to section 'Software (SCP clients)' of 'SCP' Software (SCP clients)

Linux and other Unix-like systems:

  • The scp command-line program should already be installed.

Microsoft Windows:

  • MobaXterm
    Free, full-featured, graphical Windows SSH, SCP, and SFTP client.
  • Command-line scp program can be installed as part of Windows Subsystem for Linux (WSL), or Git-Bash.

Mac OS X:

  • The scp command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

Globus

Globus, previously known as Globus Online, is a powerful and easy to use file transfer service for transferring files virtually anywhere. It works within RCAC's various research storage systems; it connects between RCAC and remote research sites running Globus; and it connects research systems to personal systems. You may use Globus to connect to your home, scratch, and Fortress storage directories. Since Globus is web-based, it works on any operating system that is connected to the internet. The Globus Personal client is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Link to section 'Globus Web:' of 'Globus' Globus Web:

  • Navigate to http://transfer.rcac.purdue.edu
  • Click "Proceed" to log in with your Purdue Career Account.
  • On your first login it will ask to make a connection to a Globus account. Accept the conditions.
  • Now you are at the main screen. Click "File Transfer" which will bring you to a two-panel interface (if you only see one panel, you can use selector in the top-right corner to switch the view).
  • You will need to select one collection and file path on one side as the source, and the second collection on the other as the destination. This can be one of several Purdue endpoints, or another University, or even your personal computer (see Personal Client section below).

The RCAC collections are as follows. A search for "Purdue" will give you several suggested results you can choose from, or you can give a more specific search.

  • Home Directory storage: "Purdue Scholar Cluster - Home and Class Directories", however, you can start typing "Purdue" and "Scholar" and it will suggest appropriate matches.
  • Scholar scratch storage: "Purdue Scholar Cluster - Scratch", however, you can start typing "Purdue" and "Scholar and it will suggest appropriate matches. From here you will need to navigate into the first letter of your username, and then into your username.
  • Class Directory storage: "Purdue Scholar Cluster - Home and Class Directories", however, you can start typing "Purdue" and "Scholar" and it will suggest appropriate matches. Once on the endpoint, you will be able to navigate to /class/...... in the Path field.
  • Research Data Depot: "Purdue Research Computing - Data Depot", a search for "Depot" should provide appropriate matches to choose from.
  • Fortress: "Purdue Fortress HPSS Archive", a search for "Fortress" should provide appropriate matches to choose from.

From here, select a file or folder in either side of the two-pane window, and then use the arrows in the top-middle of the interface to instruct Globus to move files from one side to the other. You can transfer files in either direction. You will receive an email once the transfer is completed.

Link to section 'Globus Personal Client setup:' of 'Globus' Globus Personal Client setup:

Globus Connect Personal is a small software tool you can install to make your own computer a Globus endpoint on its own. It is useful if you need to transfer files via Globus to and from your computer directly.

  • On the "Collections" page from earlier, click "Get Globus Connect Personal" or download a version for your operating system it from here: Globus Connect Personal
  • Name this particular personal system and follow the setup prompts to create your Globus Connect Personal endpoint.
  • Your personal system is now available as a collection within the Globus transfer interface.

Link to section 'Globus Command Line:' of 'Globus' Globus Command Line:

Globus supports command line interface, allowing advanced automation of your transfers.

To use the recommended standalone Globus CLI application (the globus command):

Link to section 'Sharing Data with Outside Collaborators' of 'Globus' Sharing Data with Outside Collaborators

Globus allows convenient sharing of data with outside collaborators. Data can be shared with collaborators' personal computers or directly with many other computing resources at other institutions. See the Globus documentation on how to share data:

For links to more information, please see Globus Support page and RCAC Globus presentation.

Windows Network Drive / SMB

SMB (Server Message Block), also known as CIFS, is an easy to use file transfer protocol that is useful for transferring files between RCAC systems and a desktop or laptop. You may use SMB to connect to your home, scratch, and Fortress storage directories. The SMB protocol is available on Windows, Linux, and Mac OS X. It is primarily used as a graphical means of transfer but it can also be used over the command line.

Note: to access Scholar through SMB file sharing, you must be on a Purdue campus network or connected through VPN.

Link to section 'Windows:' of 'Windows Network Drive / SMB' Windows:

  • Windows 7: Click Windows menu > Computer, then click Map Network Drive in the top bar
  • Windows 8 & 10: Tap the Windows key, type computer, select This PC, click Computer > Map Network Drive in the top bar
  • In the folder location enter the following information and click Finish:
    • To access your Scholar home directory, enter \\home.scholar.rcac.purdue.edu\scholar-home.
    • To access your scratch space on Scholar, enter \\scratch.scholar.rcac.purdue.edu\scholar-scratch. Once mapped, you will be able to navigate to your scratch directory.
    • Note: Use your career account login name and password when prompted. (You will not need to add ",push" nor use your Purdue Duo client.)
  • Your home or scratch directory should now be mounted as a drive in the Computer window.

Link to section 'Mac OS X:' of 'Windows Network Drive / SMB' Mac OS X:

  • In the Finder, click Go > Connect to Server
  • In the Server Address enter the following information and click Connect:
    • To access your Scholar home directory, enter smb://home.scholar.rcac.purdue.edu/scholar-home.
    • To access your scratch space on Scholar, enter smb://scratch.scholar.rcac.purdue.edu/scholar-scratch. Once mapped, you will be able to navigate to your scratch directory.
    • Note: Use your career account login name and password when prompted. (You will not need to add ",push" nor use your Purdue Duo client.)
  • Note: Use your career account login name and password when prompted. (You will not need to add ",push" nor use your Purdue Duo client.)

Link to section 'Linux:' of 'Windows Network Drive / SMB' Linux:

  • There are several graphical methods to connect in Linux depending on your desktop environment. Once you find out how to connect to a network server on your desktop environment, choose the Samba/SMB protocol and adapt the information from the Mac OS X section to connect.
  • If you would like access via samba on the command line you may install smbclient which will give you FTP-like access and can be used as shown below. For all the possible ways to connect look at the Mac OS X instructions.
    smbclient //home.scholar.rcac.purdue.edu/scholar-home -U myusername
    smbclient //scratch.scholar.rcac.purdue.edu/scholar-scratch -U myusername
  • Note: Use your career account login name and password when prompted. (You will not need to add ",push" nor use your Purdue Duo client.)

FTP / SFTP

FTP is not supported on any research systems because it does not allow for secure transmission of data. Use SFTP instead, as described below.

SFTP (Secure File Transfer Protocol) is a reliable way of transferring files between two machines. SFTP is available as a protocol choice in some graphical file transfer programs and also as a command-line program on most Linux, Unix, and Mac OS X systems. SFTP has more features than SCP and allows for other operations on remote files, remote directory listing, and resuming interrupted transfers. Command-line SFTP cannot recursively copy directory contents; to do so, try using SCP or graphical SFTP client.

After Aug 17, 2020, the community clusters will not support password-based authentication for login. Methods that can be used include two-factor authentication (Purdue Login) or SSH keys. If you do not have SSH keys installed, you would need to type your Purdue Login response into the SFTP's "Password" prompt.

Link to section 'Command-line usage' of 'FTP / SFTP' Command-line usage

You can transfer files both to and from Scholar while initiating an SFTP session on either some other computer or on Scholar (in other words, directionality of connection and directionality of data flow are independent from each other). Once the connection is established, you use put or get subcommands between "local" and "remote" computers. Either Scholar or another computer can be a remote.

  • Example: Initiating SFTP session on some other computer (i.e. you are on another computer, connecting to Scholar):

    $ sftp myusername@scholar.rcac.purdue.edu
    
          (transfer TO Scholar)
    sftp> put sourcefile somedir/destinationfile
    sftp> put -P sourcefile somedir/
    
          (transfer FROM Scholar)
    sftp> get sourcefile somedir/destinationfile
    sftp> get -P sourcefile somedir/
    
    sftp> exit
    

    The -P flag is optional. When used, it will cause the transfer to preserve file attributes and permissions.

  • Example: Initiating SFTP session on Scholar (i.e. you are on Scholar, connecting to some other computer):

    $ sftp myusername@$another.computer.example.com
    
          (transfer TO Scholar)
    sftp> get sourcefile somedir/destinationfile
    sftp> get -P sourcefile somedir/
    
          (transfer FROM Scholar)
    sftp> put sourcefile somedir/destinationfile
    sftp> put -P sourcefile somedir/
    
    sftp> exit
    

    The -P flag is optional. When used, it will cause the transfer to preserve file attributes and permissions.

Link to section 'Software (SFTP clients)' of 'FTP / SFTP' Software (SFTP clients)

Linux and other Unix-like systems:

  • The sftp command-line program should already be installed.

Microsoft Windows:

  • MobaXterm
    Free, full-featured, graphical Windows SSH, SCP, and SFTP client.
  • Command-line sftp program can be installed as part of Windows Subsystem for Linux (WSL), or Git-Bash.

Mac OS X:

  • The sftp command-line program should already be installed. You may start a local terminal window from "Applications->Utilities".
  • Cyberduck is a full-featured and free graphical SFTP and SCP client.

Copying files from Purdue IT research computing home directory to Scholar

The Scholar home directory and its contents are specific to the Scholar cluster, and are not available on other RCAC machines. For people having access to other Community Clusters and Scholar, there is no automatic copying or synchronization between main and Scholar home directories. At your discretion, you can manually copy all or parts of your main research computing home to Scholar using one of the methods described below.

Please note that copying may fail if the size of your research computing home directory is larger than the Scholar one's quota. Please check usage and limits before proceeding!

Link to section 'Complete copy' of 'Copying files from Purdue IT research computing home directory to Scholar' Complete copy

For your convenience, a custom tool copy-rcac-home is provided to simplify at-will duplication of your main research computing home directory into Scholar. The tool performs a complete 1-to-1 copy using rsync -auH (with exception of a narrow subset of system-specific service files).

To use the tool, simply type copy-rcac-home in a terminal window on a Scholar front-end or compute node:

$ copy-rcac-home

   This script will copy entire contents of your main RCAC
   home directory into your Scholar cluster's $HOME.

   Note: copying may fail if the size of your RCAC home directory
   is larger than your quota on the Scholar one (25GB).
   BEFORE PROCEEDING, please run 'myquota' command on another
   cluster to see your usage there and judge whether it would fit!

Would you like to proceed? [Y/n]:

At this stage answering yes will proceed with copying, or you can respond with a no (or Ctrl-C) to cancel. See copy-rcac-home --help for more details on the tool.

Link to section 'Partial copy' of 'Copying files from Purdue IT research computing home directory to Scholar' Partial copy

Desired parts (or whole) of your research computing home directories can be copied to Scholar via any of the home directories' supported transfer methods, such as SCP, SFTP, rsync, or Globus.

  • Example: recursive copying of a subdirectory from RCAC home directory into Scholar home using scp.

       (if you are on Scholar, use other cluster name for the remote part)
    $ scp -pr myothercluster.rcac.purdue.edu:somedirectory/  ~/
    
       (if you are on another cluster, use Scholar for the remote part)
    $ scp -pr somedirectory/ myusername@scholar.rcac.purdue.edu:~/
    
  • Example: copying using Globus.

    Search collections for "Purdue Research Computing - Home Directories" and "Purdue Scholar Cluster - Home" endpoints, respectively, then transfer desired files and/or directories as usual.

Storage Quota / Limits

Some limits are imposed on your disk usage on research systems. A quota is implemented on each filesystem. Each filesystem (home directory, scratch directory, etc.) may have a different limit. If you exceed the quota, you will not be able to save new files or new data to the filesystem until you delete or move data to long-term storage.

Link to section 'Checking Quota' of 'Storage Quota / Limits' Checking Quota

To check the current quotas of your home and scratch directories check the My Quota page or use the myquota command:

$ myquota
Type        Filesystem          Size    Limit  Use         Files    Limit  Use
==============================================================================
home        myusername         5.0GB   25.0GB  20%             -        -   -
scratch     scholar        220.7GB  100.0TB  0.22%            8k   2,000k  0.43%

The columns are as follows:

  • Type: indicates home or scratch directory.
  • Filesystem: name of storage option.
  • Size: sum of file sizes in bytes.
  • Limit: allowed maximum on sum of file sizes in bytes.
  • Use: percentage of file-size limit currently in use.
  • Files: number of files and directories (not the size).
  • Limit: allowed maximum on number of files and directories. It is possible, though unlikely, to reach this limit and not the file-size limit if you create a large number of very small files.
  • Use: percentage of file-number limit currently in use.

If you find that you reached your quota in either your home directory or your scratch file directory, obtain estimates of your disk usage. Find the top-level directories which have a high disk usage, then study the subdirectories to discover where the heaviest usage lies.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your home directory:

$ du -h --max-depth=1 $HOME >myfile
32K     /home/myusername/mysubdirectory_1
529M    /home/myusername/mysubdirectory_2
608K    /home/myusername/mysubdirectory_3

The second directory is the largest of the three, so apply command du to it.

To see in a human-readable format an estimate of the disk usage of your top-level directories in your scratch file directory:

$ du -h --max-depth=1 $RCAC_SCRATCH >myfile
160K    /scratch/scholar/myusername

This strategy can be very helpful in figuring out the location of your largest usage. Move unneeded files and directories to long-term storage to free space in your home and scratch directories.

Link to section 'Increasing Quota' of 'Storage Quota / Limits' Increasing Quota

Link to section 'Home Directory' of 'Storage Quota / Limits' Home Directory

If you find you need additional disk space in your home directory, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. Unfortunately, it is not possible to increase your home directory quota beyond it's current level.

Link to section 'Scratch Space' of 'Storage Quota / Limits' Scratch Space

If you find you need additional disk space in your scratch space, please first consider archiving and compressing old files and moving them to long-term storage on the Fortress HPSS Archive. If you are unable to do so, you may ask for a quota increase by contacting support.

Lost File Recovery

Scholar is protected against accidental file deletion through a series of snapshots taken every night just after midnight. Each snapshot provides the state of your files at the time the snapshot was taken. It does so by storing only the files which have changed between snapshots. A file that has not changed between snapshots is only stored once but will appear in every snapshot. This is an efficient method of providing snapshots because the snapshot system does not have to store multiple copies of every file.

These snapshots are kept for a limited time at various intervals. RCAC keeps nightly snapshots for 7 days, weekly snapshots for 4 weeks, and monthly snapshots for 3 months. This means you will find snapshots from the last 7 nights, the last 4 Sundays, and the last 3 first of the months. Files are available going back between two and three months, depending on how long ago the last first of the month was. Snapshots beyond this are not kept.

Only files which have been saved during an overnight snapshot are recoverable. If you lose a file the same day you created it, the file is not recoverable because the snapshot system has not had a chance to save the file.

Snapshots are not a substitute for regular backups. It is the responsibility of the researchers to back up any important data to the Fortress Archive. Scholar does protect against hardware failures or physical disasters through other means however these other means are also not substitutes for backups.

Files in scratch directories are not recoverable. Files in scratch directories are not backed up. If you accidentally delete a file, a disk crashes, or old files are purged, they cannot be restored.

Scholar offers several ways for researchers to access snapshots of their files.

flost

If you know when you lost the file, the easiest way is to use the flost command. This tool is available from any RCAC resource. If you do not have access to a compute cluster, any Data Depot user may use an SSH client to connect to scholar.rcac.purdue.edu and run this command.

To run the tool you will need to specify the location where the lost file was with the -w argument:

$ flost -w /depot/mylab

Replace mylab with the name of your lab's Scholar directory. If you know more specifically where the lost file was you may provide the full path to that directory.

This tool will prompt you for the date on which you lost the file or would like to recover the file from. If the tool finds an appropriate snapshot it will provide instructions on how to search for and recover the file.

If you are not sure what date you lost the file you may try entering different dates into the flost to try to find the file or you may also manually browse the snapshots as described below.

Manual Browsing

You may also search through the snapshots by hand on the Scholar filesystem if you are not sure what date you lost the file or would like to browse by hand. Snapshots can be browsed from any RCAC resource. If you do not have access to a compute cluster, any Scholar user may use an SSH client to connect to scholar.rcac.purdue.edu and browse from there. The snapshots are located at /depot/.snapshots on these resources.

You can also mount the snapshot directory over Samba (or SMB, CIFS) on Windows or Mac OS X. Mount (or map) the snapshot directory in the same way as you did for your main Scholar space substituting the server name and path for \\datadepot.rcac.purdue.edu\depot\.winsnaps (Windows) or smb://datadepot.rcac.purdue.edu/depot/.winsnaps (Mac OS X).

Once connected to the snapshot directory through SSH or Samba, you will see something similar to this:

Snapshots folders may look slightly differently when accessed via SSH on scholar.rcac.purdue.edu or via Samba on datadepot.rcac.purdue.edu. Here are examples of both.
SSH to scholar.rcac.purdue.edu Samba mount on datadepot.rcac.purdue.edu
$ cd /depot/.snapshots
$ ls -1
daily_20190129000501
daily_20190130000501
daily_20190131000502
daily_20190201000501
daily_20190202000501
daily_20190203000501
daily_20190204000501
monthly_20181101001501
monthly_20181201001501
monthly_20190101001501
monthly_20190201001501
weekly_20190113002501
weekly_20190120002501
weekly_20190127002501
weekly_20190203002501
Scholar snapshots via Samba

Each of these directories is a snapshot of the entire Scholar filesystem at the timestamp encoded into the directory name. The format for this timestamp is year, two digits for month, two digits for day, followed by the time of the day.

You may cd into any of these directories where you will find the entire Scholar filesystem. Use cd to continue into your lab's Scholar space and then you may browse the snapshot as normal.

If you are browsing these directories over a Samba network drive you can simply drag and drop the files over into your live Data Depot folder.

Once you find the file you are looking for, use cp to copy the file back into your lab's live Scholar space. Do not attempt to modify files directly in the snapshot directories.

Windows

If you use Scholar through "network drives" on Windows you may recover lost files directly from within Windows:

  • Open the folder that contained the lost file.
  • Right click inside the window and select "Properties".
  • Click on the "Previous Versions" tab.
  • A list of snapshots will be displayed.
  • Select the snapshot from which you wish to restore.
  • In the new window, locate the file you wish to restore.
  • Simply drag the file or folder to their correct locations.

In the "Previous Versions" window the list contains two columns. The first column is the timestamp on which the snapshot was taken. The second column is the date on which the selected file or folder was last modified in that snapshot. This may give you some extra clues to which snapshot contains the version of the file you are looking for.

Mac OS X

Mac OS X does not provide any way to access the Scholar snapshots directly. To access the snapshots there are two options: browse the snapshots by hand through a network drive mount or use an automated command-line based tool.

To browse the snapshots by hand, follow the directions outlined in the Manual Browsing section.

To use the automated command-line tool, log into a compute cluster or into the host scholar.rcac.purdue.edu (which is available to all Scholar users) and use the flost tool. On Mac OS X you can use the built-in SSH terminal application to connect.

  • Open the Applications folder from Finder.
  • Navigate to the Utilities folder.
  • Double click the Terminal application to open it.
  • Type the following command when the terminal opens.
    $ ssh myusername@scholar.rcac.purdue.edu
    Replace myusername with your Purdue career account username and provide your password when prompted.

Once logged in use the flost tool as described above. The tool will guide you through the process and show you the commands necessary to retrieve your lost file.

Helpful?

Thanks for letting us know.

Please don't include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.