Hathi consists of two components: the Hadoop Distributed File System (HDFS), and a MapReduce framework for job and task tracking.
The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data.
A Hadoop cluster has several components:
To request access to Hathi today, please email firstname.lastname@example.org. Subscribe to our Community Cluster Program Mailing List to stay informed on the latest purchasing developments, or email us at email@example.com if you have any questions.
Hathi consists of Dell r720xd servers, each with 16 Intel E5-2650v2 cores, 32 GB of memory, 48TB of local storage, and a 40 Gigabit Ethernet interconnect.
|Number of Nodes||Processors per Node||Cores per Node||Memory per Node||HDFS Storage per Node||Interconnect|
|6||2 Intel E5-2650v2||16||32 GB||48 TB||40 GigE|
Hathi nodes run Red Hat Enterprise Linux, Version 6 and use the PivotalHD Hadoop distribution for resource and job management. The application of operating system patches occurs as security needs dictate. All nodes allow for unlimited stack usage, as well as unlimited core dump size (though disk space and server quotas may still be a limiting factor).
All Purdue faculty, staff, and students with the approval of their advisor may request access to Hathi. Refer to the Accounts / Access page for more details on how to request access.