TeraGrid.org Parent Organization Links

File Storage and Transfer

Overview of File Storage and Transfer

Purdue provides various file storage resources to TeraGrid users, and offers GridFTP access for data transfer to and from those resources. Following is more detailed information on these resources and guidelines on proper use.

Home Directories

Your home directory is the default directory you are placed in when you log on.

You should use this space for storing files you want to keep long term such as source code, scripts, input data sets, etc. It should also be used for files you want to keep and which you use often. The home directory will physically reside on the BlueArc. You can find the path to your home directory is located by logging onto tg-login.purdue.teragrid.org, and typing "pwd".

bash-3.00$ pwd
/home/ba01/u103/user123

The second component of the reply indicates the name of the host where your home directory physically resides. In this example, the home directory is on the RCAC home directory file server named "ba01" under area "u103". That will be different from person to person. Remember, you can always check where your home directory is located by doing a "pwd" command in your home directory.

Regardless of its physical location, your home directory and its contents are available on all the machines and their nodes via the Network File System (NFS).

The command to see your disk usage and limits is "quota". The command "du -h" can also be used to get an idea of how much space you use. Home directories are backed up daily.

Quotas are measured in 1024-byte blocks and limits. You should note that the BlueArc file systems don't have the concept of soft and hard limits, nor do they have grace periods. Thus, as you can see in the example below, the "quota" column reports "0" as your soft quota. You don't get any warnings about being over your grace soft quota, instead you will just get an error when you hit your quota, which is reported under the limit column. This is true both for home directories on our systems and for scratch directories in the BlueArc scratch file systems (scratch95 and scratch96).

Here are an example on running the "quota" command on tg-login:

bash-3.00$ quota
Disk quotas for user user123 (uid 13185):
     Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
     ba12:/scratch96
                      0       0 250000000             3       0  100000
     ba02.rcac.purdue.edu:/apps/recycled
                288130368     0 524288000        4490829      0       0
     ba01:/u103 2660608       0 5000000           18097       0   65535

The "Filesystem" column indicates the file system for which the quota is being reported. The next four columns indicate the account's current disk usage, its soft quota and hard limit, and its grace period. (If your usage is under your soft limit, the grace period will be blank.) The last four columns show similar information about the number of files the account has created in the file system.

If the soft quota is exceeded an asterisk will be placed next to the disk usage and there will be a time period under "grace", showing how much time is left before the grace period expires.

Scratch Directories

Scratch file systems (shared temporary filesystems) are intended for short term use and should be considered volatile.

Please note that backups are not performed on the scratch directories. In the event of a disk crash or file purge, files on the scratch directories cannot be recovered. Therefore, you should make sure to back up your files to permanent storage as often as significant changes are made (at least daily). Files stored in the RCAC scratch storage areas will be purged after 60 days.

The scratch storage is provided by a BlueArc server. It is the same system across all systems, as opposed to having separate areas for each system. There are two scratch file systems, scratch95, and scratch96. A scratch directory will have already been created for you on one of these systems. These user scratch directories are located in subdirectories under scratch95 or scratch96, that have names that are the first letter of the user's login name.

To find the path to your scratch directory, run the command "myscratch":

bash-3.00$ myscratch
/scratch/scratch96/b/user123

or type "echo $RCAC_SCRATCH" at the command prompt:

bash-3.00$ echo $RCAC_SCRATCH
/scratch/scratch96/b/user123

When referring to your scratch space in scripts you should always use either the variable $RCAC_SCRATCH or $TG_CLUSTER_SCRATCH, since the actual path my change, but the variables will always be set right. The two variables point to the same place.

bash-3.00$ echo $TG_CLUSTER_SCRATCH
/scratch/scratch96/b/user123

To find the path to someone else's scratch directory, run the command "findscratch XXX", where "XXX" is the login id you are interested in:

bash-3.00$ findscratch user123
/scratch/scratch96/b/user123

Note: Each user has a quota in their scratch directory. By default, this quota is 250 GB. If you need more space, send a request to rcac-help@purdue.edu.

/tmp Directory

The /tmp directory are intended for temporary files that are used during the execution of a process or job or while you examine files created by your jobs. Please do not use these directories for longer term storage of user files.

Since files in /tmp are removed whenever space is low and when the system is rebooted, you should only use it for files that can be recreated relatively easily. Files that are difficult or expensive to recreate should be stored elsewhere, such as your home directory. Files placed in /tmp may be purged at any time.

The /tmp on the nodes are purged whenever the node is rebooted. The files are also removed when that node is assigned to a job by the PBS job scheduler. This is done to ensure that each job will have access to as much /tmp space as possible as it begins its execution.

Always use the variable $TG_NODE_SCRATCH to refer to /tmp in all scripts.

TeraGrid Environment Variables

You can see all your environment variable by typing "env" at the command prompt. Some of the specific ones important for TeraGrid are:

  • $GLOBUS_PATH
  • $GLOBUS_LOCATION
  • $TG_CLUSTER_HOME (your home directory)
  • $RCAC_SCRATCH
  • $TG_NODE_SCRATCH ("/tmp")
  • $CONDOR_LOCATION
  • $TG_EXAMPLES
  • $TG_APPS_PREFIX
  • $TG_BASE
  • $TG_CLUSTER_SCRATCH (same as $RCAC_SCRATCH)
  • $TG_COMMUNITY

File Transfer

If you need to move a file to another site, such as your certificate, this can be done either with traditional file transfer programs or with grid-ftp programs. Here is how you might use the grid-ftp program tgcp:

tgcp local_filename remote.system.name:/path/to/where_you_want_the_file/

TeraGrid has more information on grid-ftp programs and use on their site as well.