Fortress - Getting Started

Conventions Used in this Document

Throughout this document, some typesetting and naming conventions will be used. Here are a few you may wish to note:

  • Colored, underlined text indicates a link.
  • Colored, bold text highlights something of particular importance.
  • Italicized text notes the first use of a key concept or term.
  • Bold, fixed-width font text indicates a command or command argument that may be typed verbatim.
  • Examples of commands and output as you would see them on the command line will appear in colored blocks of fixed-width text such as this:
    $ example
    This is an example of commands and output.
    
  • All command line shell prompts will be shown as a single dollar sign ("$"). Your actual shell prompt may differ.
  • All names that begin with "my" are intended as examples that must be replaced by the appropriate name. These include "myusername", "myfilename", "mydirectory", "myjobid", etc.

Overview of Fortress

The Fortress system is a large, long-term, multi-tiered file caching and storage system utilizing both online disk and robotic tape drives. ITaP upgraded Fortress from DXUL to HPSS in October of 2011.

Detailed Hardware Specification

Fortress uses an ADIC Scalar 10K robotic tape library from Quantum with a maximum capacity of 6.6 PB. Currently, Fortress has a usable capacity of 1.3 PB.

Storage Subsystem Current Capacity Hardware
Disk Cache 25 TB 2 IBM DS3512 Storage arrays
File Metadata Storage 7 TB IBM DS3524 Storage
with EXP3524 expansion tray
Long-Term Storage 1.3 PB LTO-IV Robotic Tape Library

All files stored on Fortress appear in at least two separate storage devices:

  • One copy is permanently on tape.
  • Recently used or files smaller than 100MB have their primary copy stored on a conventional spinning disk storage array (disk cache). Disk cache provides a rapid restore time.
Both primary and secondary copies of larger files reside on separate tape cartridges in the robotic tape library. After a period of inactivity, HPSS will migrate files from disk cache to tape.

To ensure optimal performance for all users, and to keep the Fortress system healthy, please remember the following tips:

  • Fortress operates most effectively with large files - 1GB or larger. If your data is comprised of smaller files, use HTAR to directly create archives in Fortress.
  • When working with files on cluster head nodes, use your home directory or a scratch file system, rather than editing or computing on files directly in Fortress. Copy any data you wish to archive to Fortress after computation is complete.
  • The HPSS software does not handle sparse files (files with empty space) in an optimal manner. Therefore, if you must copy a sparse file into HPSS, use HSI rather than the cp or mv commands.
  • Due to the sparse files issue, the rsync command should not be used to copy data into Fortress through NFS, as this may cause problems with the system.

Fortress runs Red Hat Enterprise Linux, version 5.6 and uses HPSS 7.3.3p1 from IBM and the HPSS Collaboration.

Accounts on Fortress

Obtaining an Account

All Purdue faculty, staff, and students participating in the Community Cluster program have access to Fortress along with their cluster nodes and scratch space.

Additionally, all Purdue faculty, staff, and students with the approval of their advisor may also request access to Fortress. Refer to the Accounts / Access page for more details on how to request access.

Research groups may, upon request, be assigned a group data storage space within Fortress to facilitate sharing of research data. Access to this space is controlled a UNIX group assigned to the research group. Faculty should contact ITaP at rcac-help@purdue.edu to create a shared space for their research group.

ITaP research computing resources are not intended to store data protected by Federal privacy and security laws (e.g., HIPAA, ITAR, classified, etc.). It is the responsibility of the faculty partner to ensure that no protected data is stored on the systems.

  • Particularly in the case of group storage, please keep in mind that such spaces are, by design, accessible by others and should not be used to store private information such as grades, login credentials, or personal data.

Login / SSH

It is not possible to login directly to Fortress via SSH, SCP, or SFTP. You may access your files there efficiently using HSI or HTAR. Windows Network Drive/SMB access is possible, though with significant performance loss.

All ITaP research systems may access Fortress via HSI or HTAR without any Kerberos keytab preparation. However, if for some reason you lose your keytab, you may easily regenerate one on any ITaP research system by running the command fortresskey.

However, to access Fortress from a personal or departmental computer, you will need to first copy your Kerberos keytab file to the computer you wish to use. This keytab can be found in your research home directory, within the hidden subdirectory named ".private" as the file "hpss.keytab" (.private/hpss.keytab). This keytab will allow you to access HPSS services without needing to type a password and will remain valid for 90 days. Your keytab on ITaP research systems will automatically be regenerated after this time, and you will need to re-copy the new keytab file to any other computers you use to directly access Fortress then.

If you do not have an account on any ITaP research systems other than Fortress, you will need to generate a keytab file using the web interface:

Passwords

Fortress does not use your ITaP or Purdue Career Account password. Instead, it uses a Kerberos keytab to control access. See the Login section above for more details.

File Storage and Transfer for Fortress

Storage Options

File storage on Fortress consists solely of long-term or permanent storage. Home directories on Fortress are the long-term or permanent storage filesystems.

Home Directories

Your home directory on Fortress is the default directory that your archive files are stored.

On Fortress, your home directory will be in the /archive/fortress/home/ file system, and will not be the same as your home directory on any other ITaP system. Your home directory on Fortress is your long-term storage directory for all ITaP systems.

$ pwd
/archive/fortress/home/myusername

Long-Term Storage

Long-term Storage or Permanent Storage is available to ITaP research users on the HPSS archival storage system, commonly referred to as "Fortress". HPSS is a software package that manages a hierarchical storage system. Program files, data files and any other files which are not used often, but which must be saved, can be put in permanent storage. Fortress currently has a 1.2 PB capacity.

Files smaller than 100 MB have their primary copy stored on low-cost disks (disk cache), but the second copy (backup of disk cache) is on tape or optical disks. This provides a rapid restore time to the disk cache. However, the large latency to access a larger file (usually involving a copy from a tape cartridge) makes it unsuitable for direct use by any processes or jobs, even where possible. The primary and secondary copies of larger files are stored on separate tape cartridges in the ADIC tape library.

To ensure optimal performance for all users, and to keep the Fortress system healthy, please remember the following tips:

  • Fortress operates the most effectively with large files - 1GB or larger. If your data is comprised of smaller files, use HTAR to directly create archives in Fortress.
  • When working with files on cluster head nodes, use your home directory or a scratch file system, rather than editing or computing on files directly in Fortress. Copy any data you wish to archive to Fortress after computation is complete.
  • The HPSS software does not handle sparse files (files with empty space) in an optimal manner. Therefore, if you must copy a sparse file into HPSS, use HSI rather than the cp or mv commands.
  • Due to the sparse files issue, the rsync command should not be used to copy data into Fortress through NFS, as this may cause problems with the system.

Fortress writes two copies of every file either to two tapes, or to disk and a tape, to protect against medium errors. Unfortunately, Fortress does not automatically switch to the alternate copy when it has trouble accessing the primary. If it seems to be taking an extraordinary amount of time to retrieve a file (hours), please either email rcac-help@purdue.edu or call ITaP Customer Service at 765-49-4400. We can then investigate why it is taking so long. If it is an error on the primary copy, we will instruct Fortress to switch to the alternate copy as the primary and recreate a new alternate copy.

Lost Long-Term Storage File Recovery

Data on Fortress is not backed up elsewhere in a traditional sense. New and modified files in the disk cache are migrated to tape within 30 minutes, and Fortress maintains two copies of every file on different media to protect against media failures, but there is no backup protecting against user changes.

If you remove or overwrite a file on Fortress, it is gone. You cannot request to have it retrieved.

 

Storage Quotas / Limits

There is currently no quota on Fortress disk use. Although it may seem an infinite amount of space, we expect Fortress to fill up just like any other storage device.

Long-time Fortress users may be accustomed to a monthly email report showing your current Fortress usage. At this time, it will not be available on HPSS, but we hope to be able to resume this feature in the near future.

Files belonging to deleted accounts will also be retained, but inaccessible except by special request after the accounts have been terminated. The files will be kept for no more than ten years or the usability of the media on which they are stored, whichever comes first.

Archive and Compression

There are several options for archiving and compressing groups of files or directories on ITaP research systems. ITaP provides the following tools:

  • zip   (more information)
    Simple compression and file packaging utility.
    Examples:
      (extract contents of somefile.zip)
    $ unzip somefile.zip
    
      (compress file somefile.c)
    $ zip somefile.zip somefile.c
    
      (compress all files in a directory into one archive file)
    $ zip -r somefile.zip somedirectory/
    
      (compress all ".c" files in current directory into one archive file)
    $ zip -r somefile.zip . -i \*.c
    
  • 7zip   (more information)
    Simple compression and file packaging utility which offers much better compression than zip.
    Examples:
      (extract contents of somefile.7z)
    $ 7za e somefile.7z
    
      (compress file somefile.c)
    $ 7za a somefile.7z somefile.c
    
      (compress all files in a directory into one archive file)
    $ 7za a somefile.7z somedirectory/
    
      (compress all ".c" files in current directory into one archive file)
    $ 7za a somefile.7z *.c
    
  • tar   (more information)
    Saves many files together into a single archive file, and restores individual files from the archive. Includes automatic archive compression/decompression options and special features for incremental and full backups.
    Examples:
      (list contents of archive somefile.tar)
    $ tar tvf somefile.tar
    
      (extract contents of somefile.tar)
    $ tar xvf somefile.tar
    
      (extract contents of gzipped archive somefile.tar.gz)
    $ tar xzvf somefile.tar.gz
    
      (extract contents of bzip2 archive somefile.tar.bz2)
    $ tar xjvf somefile.tar.bz2
    
      (extract contents of xz archive somefile.tar.xz)
    $ tar xJvf somefile.tar.xz
    
      (archive file somefile.c)
    $ tar cvf somefile.tar somefile.c
    
      (archive all ".c" files in current directory into one archive file)
    $ tar cvf somefile.tar.gz *.c 
    
      (archive all files in a directory into one archive file)
    $ tar cvf somefile.tar.gz somedirectory/
    
      (archive and gzip-compress all files in a directory into one archive file)
    $ tar czvf somefile.tar.gz somedirectory/
    
      (archive and bzip2-compress all files in a directory into one archive file)
    $ tar cjvf somefile.tar.bz2 somedirectory/
    
      (archive and xz-compress all files in a directory into one archive file)
    $ tar cJvf somefile.tar.xz somedirectory/
    
  • gzip   (more information)
    Compression utility designed as a replacement for compress, with much better compression and no patented algorithms. The standard compression system for all GNU software.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ gzip somefile
    
      (uncompress file somefile.gz - also removes compressed file)
    $ gunzip somefile.gz
    
  • bzip2   (more information)
    Strong, lossless data compressor based on the Burrows-Wheeler transform. Stronger compression than gzip.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ bzip2 somefile
    
      (uncompress file somefile.bz2 - also removes compressed file)
    $ bunzip2 somefile.bz2
    
  • xz   (more information)
    Strong, lossless data compressor based on the LZMA2 compression algorithm. Stronger compression than gzip or bzip2.
    Examples:
      (compress file somefile - also removes uncompressed file)
    $ xz somefile
    
      (uncompress file somefile.xz - also removes compressed file)
    $ unxz somefile.xz
    
  • compress   (more information)
    Adaptive Lempel-Ziv compressor. Not often used today.

Windows users can work with these same formats using some of the following software:

  • 7-Zip
    Free Windows software package that can handle all the above formats.
  • WinZip
    Commercial Windows software package that can handle all the above formats.
  • WinRAR
    Commercial Windows software package that can handle all the above formats.

Fortress Frequently Asked Questions (FAQ)

How does HPSS differ from DXUL in accessing my data?

  • Plain-text FTP is not supported, due to the unacceptable security risk. Instead we now have the more powerful HSI and HTAR, which is both secure and parallelizes data transfers so that the full network bandwidth can be utilized.
  • NFS mounting is limited to ITaP research cluster front-end hosts. This restriction is due to the potential for NFS to cause overall system performance issues.
  • NFS mounts outside of the rcac.purdue.edu domain, such as on departmental systems, are not supported.
  • HPSS does not support direct SFTP and SCP transfers. However, these protocols may be used to connect to ITaP research cluster front-end hosts where you have an account, and using the NFS-mounted directory for Fortress there. These protocols will not be able to take advantage of the optimization of the new system to parallelize data streams.
  • Access via Windows Network Drive/SMB will remain, though this protocol will not be able to take advantage of the optimization of the new system to parallelize data streams.
  • HPSS does not support UNICODE filenames. All filenames must contain only ASCII characters.
  • HPSS does not support sparse files. Therefore, using the rsync command to copy data in through Fortress' NFS gateway is not recommended.

What is the best way to access my data?

  • HSI provides a Unix-style interface taking advantage of the power of HPSS without requiring any special user knowledge.
  • HTAR is a utility to aggregate a set of files into a single tar archive directly into HPSS, without requiring scratch space to first create an archive.

Are there any limitations with HSI or HTAR?

  • HTAR has an individual file size limit of 64GB. If any files you are trying to archive with HTAR are greater than 64GB, then HTAR will immediately fail. This does not limit the number of files in the archive or the total overall size of the archive. To get around this limitation, try using the htar_large command. It is slower than using HTAR but it will work around the 64GB file size limit.

Can I download HSI or HTAR binaries for my OS platform?

  • Microsoft Windows
        Windows Installation Instructions
  • Mac OS X
        Mac DMG
  • RedHat / Fedora
        32-bit x86 RPM
        RHEL5/6 Repository (add to /etc/yum.repos.d/rcac-public.repo)
  • Ubuntu / Debian
        32-bit x86 DEB
        64-bit x86 DEB
        Debian/Ubuntu Repository (append contents to /etc/apt/sources.list)
  • Solaris 10
        32-bit x86 PKG
        Sparc PKG
  • Note: If your username on your desktop does not match your career account username, HSI and HTAR require configuration to connect using your career account username. Issue the command echo "principal = careeraccount" >> ~/.hsirc to configure the current and any future sessions to use the currect username. You may also set the HPSS_PRINCIPAL environment variable to your career account username or use the -l careeraccount option for HSI.

Do I need to do anything to my firewall to access Fortress?

  • Any machines using HSI or HTAR must have all firewalls (local and departmental) configured to allow open access from the following IP addresses:

    • 128.210.251.141
    • 128.210.251.142
    • 128.210.251.143
    • 128.210.251.144
    • 128.210.251.145
  • If you are unsure of how to modify your firewall settings, please consult with your department's IT support or the documentation for your operating system. Access to Fortress is restricted to on-campus networks. If you need to directly access Fortress from off-campus, please use the Purdue VPN service before connecting.

Can I set up a shared space for my research group to share data?

  • Research groups may, upon request, be assigned a group data storage space within Fortress to facilitate sharing of research data. Access to this space is controlled through a UNIX group assigned to the research group.
  • ITaP research resources are not intended to store data protected by Federal privacy and security laws (e.g., HIPAA, ITAR, classified, etc.). It is the responsibility of the faculty partner to ensure that no protected data is stored on the systems.

    • Particularly in the case of group storage, please keep in mind that such spaces are, by design, accessible by others and should not be used to store private information such as grades, login credentials, or personal data.
  • Contact us at rcac-help@purdue.edu to create a group space for your group.

How can I fix the error: "put: Error -5 on transfer" when I use HSI/HTAR from my workstation?

  • First, check your firewall settings, and ensure that there are no firewall rules interfering with connecting to Fortress. If firewalls are not responsible:
  • Open the file named /etc/hosts on your workstation, especially if you run a Debian or Ubuntu Linux distribution.
  • Look for a line like:
    127.0.1.1  hostname.dept.purdue.edu hostname
  • Replace the IP address 127.0.1.1 with the real IP address for your system.
  • If you don't know your IP address, you can find it with the command:
    host `hostname --fqdn`