Purdue hosting National Science Foundation big data computing workshop

September 2, 2014  11:00am – 5:00pm
STEW 314

Purdue will host a workshop in September for students, post-doctoral researchers, faculty and staff looking to gain skills in working with big data using Hadoop, Spark and Urika for data processing and analysis.

The one-day session, also open to non-Purdue Indiana professionals, will take place from 11 a.m. to 5 p.m. on Tuesday, Sept. 2, in the Stewart Center, Room 314. Space is limited so participants should register soon. There is no cost to register.

The National Science Foundation and ITaP are sponsoring the event. Purdue is the only school in Indiana hosting the big data workshop.

Participants register with the National Science Foundation Extreme Science and Engineering Discovery Environment (XSEDE), in which Purdue is a partner. A free XSEDE account can be created on the XSEDE user portal at portal.xsede.org. Once you have an account, you can register through the XSEDE portal.

The workshop will give participants hands-on experience compiling, packaging, submitting, monitoring, and collecting the output of a Hadoop job. Those attending also will learn how to integrate the Spark platform and its concept of resilient distributed data sets with Hadoop, and get an overview of the graph analytic approach to data analysis employed by Urika.

The workshop is part of a series of high-performance computing training sessions being held by XSEDE. ITaP plans to host others at Purdue over 2014-15 school year, says Stephen Harrell, a senior high-performance computing system administrator who coordinates training for ITaP Research Computing (RCAC).

The big data workshop is delivered nationwide using high-definition video conferencing to allow participants to interact in real time with course instructors from the Pittsburgh Supercomputing Center and to work in person with local colleagues and experts. At Purdue, staff from ITaP Research Computing (RCAC) will be on hand.

ITaP Research Computing (RCAC) operates Purdue’s community cluster supercomputers and the new Research Data Depot high-capacity storage system for active data sets, tools ready-made to enable big data projects by Purdue researchers.

For more information, email rcac-help@purdue.edu.

Originally posted: August 19, 2014  4:08pm