Lost File Recovery

Data Depot is protected against accidental file deletion through a series of snapshots taken every night just after midnight. Each snapshot provides the state of your files at the time the snapshot was taken. It does so by storing only the files which have changed between snapshots. A file that has not changed between snapshots is only stored once but will appear in every snapshot. This is an efficient method of providing snapshots because the snapshot system does not have to store multiple copies of every file.

These snapshots are kept for a limited time at various intervals. ITaP keeps nightly snapshots for 7 days, weekly snapshots for 4 weeks, and monthly snapshots for 3 months. This means you will find snapshots from the last 7 nights, the last 4 Sundays, and the last 3 first of the months. Files are available going back between two and three months, depending on how long ago the last first of the month was. Snapshots beyond this are not kept.

Only files which have been saved during an overnight snapshot are recoverable. If you lose a file the same day you created it, the file is not recoverable because the snapshot system has not had a chance to save the file.

Snapshots are not a substitute for regular backups. It is the responsibility of the researchers to back up any important data to the Fortress Archive. Data Depot does protect against hardware failures or physical disasters through other means however these other means are also not substitutes for backups.

Data Depot offers several ways for researchers to access snapshots of their files.

flost

If you know when you lost the file, the easiest way is to use the flost command. This tool is available from any ITaP Research Computing resource. If you do not have access to a compute cluster, any Data Depot user may use an SSH client to connect to data.rcac.purdue.edu and run this command.

To run the tool you will need to specify the location where the lost file was with the -w argument:

$ flost -w /depot/mylab

Replace mylab with the name of your lab's Data Depot directory. If you know more specifically where the lost file was you may provide the full path to that directory.

This tool will prompt you for the date on which you lost the file or would like to recover the file from. If the tool finds an appropriate snapshot it will provide instructions on how to search for and recover the file.

If you are not sure what date you lost the file you may try entering different dates into the flost to try to find the file or you may also manually browse the snapshots as described below.

Manual Browsing

You may also search through the snapshots by hand on the Data Depot filesystem if you are not sure what date you lost the file or would like to browse by hand. Snapshots can be browsed from any ITaP Research Computing resource. If you do not have access to a compute cluster, any Data Depot user may use an SSH client to connect to data.rcac.purdue.edu and browse from there. The snapshots are located at /depot/.snapshots on these resources.

You can also mount the snapshot directory over Samba (or SMB, CIFS) on Windows or Mac OS X. Mount (or map) the snapshot directory in the same way as you did for your main Data Depot space substituting the server name and path for \\datadepot.rcac.purdue.edu\depot\.winsnaps (Windows) or smb://datadepot.rcac.purdue.edu/depot/.winsnaps (Mac OS X).

Once connected to the snapshot directory through SSH or Samba, you will see something similar to this:

Snapshots folders may look slightly differently when accessed via SSH on data.rcac.purdue.edu or via Samba on datadepot.rcac.purdue.edu. Here are examples of both.
SSH to data.rcac.purdue.edu Samba mount on datadepot.rcac.purdue.edu
$ cd /depot/.snapshots
$ ls -1
daily_20190129000501
daily_20190130000501
daily_20190131000502
daily_20190201000501
daily_20190202000501
daily_20190203000501
daily_20190204000501
monthly_20181101001501
monthly_20181201001501
monthly_20190101001501
monthly_20190201001501
weekly_20190113002501
weekly_20190120002501
weekly_20190127002501
weekly_20190203002501
Data Depot snapshots via Samba

Each of these directories is a snapshot of the entire Data Depot filesystem at the timestamp encoded into the directory name. The format for this timestamp is year, two digits for month, two digits for day, followed by the time of the day.

You may cd into any of these directories where you will find the entire Data Depot filesystem. Use cd to continue into your lab's Data Depot space and then you may browse the snapshot as normal.

If you are browsing these directories over a Samba network drive you can simply drag and drop the files over into your live Data Depot folder.

Once you find the file you are looking for, use cp to copy the file back into your lab's live Data Depot space. Do not attempt to modify files directly in the snapshot directories.

Windows

If you use Data Depot through "network drives" on Windows you may recover lost files directly from within Windows:

  • Open the folder that contained the lost file.
  • Right click inside the window and select "Properties".
  • Click on the "Previous Versions" tab.
  • A list of snapshots will be displayed.
  • Select the snapshot from which you wish to restore.
  • In the new window, locate the file you wish to restore.
  • Simply drag the file or folder to their correct locations.

In the "Previous Versions" window the list contains two columns. The first column is the timestamp on which the snapshot was taken. The second column is the date on which the selected file or folder was last modified in that snapshot. This may give you some extra clues to which snapshot contains the version of the file you are looking for.

Mac OS X

Mac OS X does not provide any way to access the Data Depot snapshots directly. To access the snapshots there are two options: browse the snapshots by hand through a network drive mount or use an automated command-line based tool.

To browse the snapshots by hand, follow the directions outlined in the Manual Browsing section.

To use the automated command-line tool, log into an ITaP Research Computing cluster or into the host data.rcac.purdue.edu (which is available to all Data Depot users) and use the flost tool. On Mac OS X you can use the built-in SSH terminal application to connect.

  • Open the Applications folder from Finder.
  • Navigate to the Utilities folder.
  • Double click the Terminal application to open it.
  • Type the following command when the terminal opens.
    $ ssh myusername@data.rcac.purdue.edu
    Replace myusername with your Purdue career account username and provide your password when prompted.

Once logged in use the flost tool as described above. The tool will guide you through the process and show you the commands necessary to retrieve your lost file.

Helpful?

Thanks for letting us know.

Please don’t include any personal information in your comment. Maximum character limit is 250.
Characters left: 250
Thanks for your feedback.