NSF awards $3.5 million grant to build powerful web platform for data-driven science
October 12, 2017
A team led by Purdue senior research scientist Ann Christine Catlin has been awarded a four-year, $3.5 million grant from the National Science Foundation to build a web platform that provides a full spectrum of data services, connects data to computational resources and statistical software, and tracks research workflows to link data, algorithms and results.
The platform, known as Digital Environment for Enabling Data-driven Science (DEEDS), will be developed in collaboration with research teams from four science and engineering fields – chemistry, nutrition science, environmental science, and electrical engineering – each with different data and computational needs.
“Some disciplines have invested heavily in customized environments that support their research workflows end-to-end, but most research groups carry out their investigations in an ad hoc way, without infrastructure that provides them with support for collecting, organizing and preserving their data, for launching and tracking their computations, for exploring and sharing their workflows and results,” says Catlin. “We want to help these researchers by providing sophisticated data, computation and workflow
Catlin’s co-PIs will use DEEDS in their own research to collect, share and explore data and computational workflows and to publish data, code, analyses and results. The tracking of research workflows by DEEDS is a key innovation, making the path to research results traceable and enabling interpretation, vetting and reuse of those results.
One of the co-PIs, Marisol Sepúlveda, professor of ecology and natural systems, is funded by the Department of Defense to develop amphibian toxicity reference values for ecological risk assessment in contaminated sites. The research aids in making decisions on exposure mitigation and federal regulations for pollution control.
“Researchers must be able to trace backward from any final result to the timestamp and value of the primary observation,” says Sepúlveda. “It must be readily apparent and understandable how data points were selected, combined and transformed.”
The other co-PIs are:
- Ashraf Alam, Jai N. Gupta Professor of Electrical and Computer Engineering. His research investigates solar photovoltaic (PV) systems by coupling local weather information, solar farm health data and manufacturer-specific PV technology to determine efficiency degradation and to predict system lifetime. Results will be presented to manufacturing and industry groups to influence how modules are fabricated, operated and monitored.
- Connie Weaver, distinguished professor of nutrition science. Her laboratory, funded by the National Institutes of Health, studies the effect of diet quality and sodium intake on risk factors of cardiovascular disease in adolescents. The research will provide reference data to determine food guidance and set dietary recommendations by medical institutions, professional societies and government agencies.
- Joseph Francisco, dean of the College of Arts and Sciences and Elmer H. and Ruby M. Cordes Chair in Chemistry at the University of Nebraska-Lincoln. His research involves basic studies in spectroscopy, kinetics, and photochemistry of transient species in the gas phase. He uses orbital methods to predict properties and provide reference data that can be used as guides in experimental search for these species.
The co-PIs will help guide the development of user interfaces, features, capabilities and tools for DEEDS, ensuring usability and guaranteeing full support for their use cases. Catlin notes that the greatest challenge will be making DEEDS easy to use.
“We want researchers to define complex data models by clicking on spreadsheets, and to explore data with viewers that interpret data types for visualization and analysis,” she says. “We want to automatically launch computational code for user selected input, and automatically return and link output results to the code and input.”
DEEDS builds on DataHub, a platform created by Catlin for preserving and publishing research data for discovery and exploration. DEEDS will transform DataHub from a data sharing platform to a full research support environment.
Like DataHub, DEEDS will take advantage of HUBzero, Purdue’s ITaP-developed cyberinfrastructure for scientific collaboration that now powers more than 60 interactive, web-based hubs in fields such as nanotechnology, cancer research, earthquake engineering, pharmaceutical manufacturing, volcanology, environmental modeling, and biofuels.
Catlin’s development team is led by Chandima Hewa Nadungodage, a senior software engineer for ITaP Research Computing, and includes senior software engineers Steven Clark and Sumudinie Fernando and computer science doctoral student Andres Bejarano. Santiago Pujol, professor of civil engineering, and postdoctoral researcher Lucas Laughery, who collaborated with Catlin on DataHub, will work on usability and sustainability issues, and continue to promote the platform for preservation and discovery of civil engineering research data.