Overview of Hathi

Hathi is a shared Hadoop cluster operated by ITaP, and is a shared resource available to partners in Purdue's Community Cluster Program. Hathi went into production on September 8, 2014.

Hathi consists of two components: the Hadoop Distributed File System (HDFS), and a MapReduce framework for job and task tracking.

The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data.

A Hadoop cluster has several components:

  • Name Node
  • Resource Manager
  • Data Node
  • Task Manager

Hathi Detailed Hardware Specification

Hathi consists of 6 Dell compute nodes with two 8-core Intel E5-2650v2 CPUs, 32 GB of memory, and 48TB of local storage per node for a total cluster capacity of 288TB. All nodes have 40 Gigabit Ethernet interconnects and a 5-year warranty.
Sub-Cluster Number of Nodes Processors per Node Cores per Node Memory per Node Interconnect TeraFLOPS
Hathi 6 Two 8-core Intel E5-2650v2 + 48TB storage 16 256 GB 40 GigE N/A

Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, (765) 494-4600

© 2017 Purdue University | An equal access/equal opportunity university | Copyright Complaints | Maintained by ITaP Research Computing

Trouble with this page? Disability-related accessibility issue? Please contact us at online@purdue.edu so we can help.