Faster research network will speed research data travels on Purdue’s campus and off

August 26, 2014

Farmers in the fields of the future growing food in the amounts needed to feed the world are likely to do it with assistance from a steady data stream — one generated by a network of tiny sensors set among the plants and in sensor-laden remote-control mini aircraft flying overhead, coupled with the computing software and hardware, accessible via the Internet, to make sense of all that data.

At Purdue, where one focus of President Mitch Daniels’ Purdue Moves initiative is plant science with the aim of meeting future food needs, the technology for that kind of data-driven farming already is being developed in University research fields and labs. And the data is beginning to flow.

“We’re enabling that grand vision,” says Pat Smoker, director of information technology for Purdue’s College of Agriculture.

The faster Purdue researchers can move and work with that data, the faster the time to discovery and the sooner the vision may become reality, which is why Smoker, for one, was pleased to hear about a newly upgraded campus research network 58 times faster at its core than the old network.

The faster, more robust research network should allow Purdue faculty, staff and student researchers to move data for computation, analysis, storage and sharing easier on campus and nationwide. The upgrade includes state-of-the-art, high-speed network traffic routers to link Purdue’s Community Cluster Program supercomputers, instruments and other research equipment to research data storage resources like the new Research Data Depot.

The Research Data Depot will make over 2 petabytes of storage available this fall to Purdue research groups or campus units in need of a high-capacity central solution for safely storing large, active research data sets and sharing them with both on-campus and off-campus collaborators. The new service from ITaP Research Computing (RCAC) will move data over the upgraded research network, whether to the community clusters, office and lab computers for computation and analysis, or from high-performance instruments that generate large data streams.

In addition, the network upgrade lets Purdue researchers take full advantage of the 100-gigabit pipeline from Purdue to high-speed national research networks like the Internet2.

Under a new agreement between Purdue and Globus, the network and Research Data Depot also will be integrated with the Globus service, including a Purdue-branded Globus access website. Globus, which bills itself as something like Dropbox for scientists, offers an easy, fast, secure way to move large research data sets to and from campus and share them among collaborators.

“We’ve removed a large number of bottlenecks,” says Michael Shuey, ITaP's research infrastructure architect.

Smoother data movement on campus, off campus, and back and forth between the two is increasingly important for Purdue researchers as the amount and size of the data they use continues to grow.

The Purdue Genomics Core Facility plans to use the Research Data Depot and should benefit from both the new storage option and the upgraded research network. Purdue’s only biological sequencing center, the facility generates sequences for researchers around campus studying target organisms ranging from honeybees to whales and fungi to apples.

Rick Westerman, bioinformatics specialist at the genomics facility, said it faces two challenges the faster network could help alleviate. It reads and writes a lot of large files to storage as the sequences its instruments produce are, in effect, edited and annotated. In addition, some programs the facility employs create flurries of small files, a challenge because of the number of files that have to be moved rather than their size.

The faster research network also is important for research like that of Purdue’s CMS Tier-2 Center. CMS stands for Compact Muon Solenoid, one of four big experiments associated with the Large Hadron Collider, the international particle accelerator project exploring how physics, space, time and other important things in the universe work. The Purdue center is a keystone in examining data produced by the particle collisions in the accelerator, which is located in Europe.

“They’re moving a large amount of data to Purdue from outside and also moving data on campus to Purdue’s five flagship research cluster supercomputers,” Shuey says.

The upgraded network also puts Purdue in position to accommodate its next generation of research supercomputers as well as data storage resources and data-generating instruments.

Originally posted: August 26, 2014  2:47pm