Data data everywhere, including HUBzero...

Data data everywhere, including HUBzero hubs, where it’s ready for researchers to put it to work

January 17, 2014
Science Highlights

Thymic cancer is rare enough that one doctor, or even one hospital, never sees more than a few cases, making it difficult to better understand the stages of the disease and how to treat it.

Aggregated internationally, however, there are enough cases to start closing the information gap, if only data on the staging and treatment of thymic malignancies could be gathered in one place and made accessible for mining. However, creating a centralized repository for the worldwide collection of patient data had never been attempted.

Enter Purdue’s HUBzero platform, along with the ITaP Research Computing (RCAC) group headed by Ann Christine Catlin. For years, Catlin’s team has been designing and building data collection, management, sharing and analysis capabilities into HUBzero for research focused on topics ranging from pediatric HIV to pharmaceutical manufacturing.

Catlin's database development group developed a research database system for ITMIG, the International Thymic Malignancy Interest Group, an organization working to advance understanding and treatment of cancer of the thymus. The ITaP group built a retrospective database for gathering global historical patient data and a prospective database for medical professionals the world over to start entering new data on current patients.

Data for the retrospective database was contributed using a simple spreadsheet template. Hospitals around the world collected historical patient data into spreadsheets and then uploaded the spreadsheets, which the hub database incorporated automatically. ITMIG had hoped 1,000 cases could be collected, and predicted it would take about a year to do so.

“In three months we had more than 8,000 cases from 110 hospitals in 21 countries,” says Catlin, a senior research scientist for ITaP. “The data is now being used to validate a new staging classification for thymic malignancy and also to investigate rates of recurrence and survival.”

“It's a quantum leap forward,” says Dr. Frank Detterbeck, the founder and current chairman of ITMIG. “The engagement and collaboration of people spread around the globe has been simply astounding.”

“For a rare tumor, it's unprecedented,” adds Detterbeck, professor and chief of thoracic surgery at the Yale University School of Medicine and associate director of the Yale Cancer Center. “It allows us in a real-time way to move insights about better outcomes into practice around the world.”

Originally developed by ITaP to power nanoHUB.org, HUBzero is a web-based platform for building research and educational collaborations. A major feature is its ability to deploy computational research codes, and visualize and analyze results, all through a Web browser. Built-in social networking creates communities in almost any field or subject matter and facilitates collaboration and distribution of research results as well as training and educational materials.

But HUBzero didn’t start life with extensive, sophisticated data handling capability. That started with Catlin developing data technologies on cceHUB to support a collaborative community of investigators translating colorectal cancer research into clinical practice. It wasn’t long before other Hub owners took note — and wanted data support, too.

“My group creates database technologies for research communities that need to collect and share data,” Catlin says. “The really cool part is that our HUBzero databases make it possible to explore data in many sophisticated ways, so that researchers worldwide can learn new things from the data.”

That's more than opportune in this “big data” era with researchers and others gathering more and more data and trying to glean more and more knowledge from it. Meanwhile, the National Science Foundation, the National Institutes of Health and other funding agencies now require data management plans in grant proposals, which has only upped the demand for database capability in hubs.

The more Catlin's team deals with very different kinds of research projects, the more database technology, features and capabilities they build into HUBzero. Take NEEShub, which supports the National Science Foundation's George E. Brown Jr. Network for Earthquake Engineering Simulation (NEES). Users needed to be able to explore maps and map-related data, so Catlin’s group added that capability.

The resulting systems also make it possible to collect, share and explore data related to impacts not only for earthquakes, but also for hurricanes, tornadoes, floods and other natural disasters. With the capability to add and annotate data collected in the field and to manage data in myriad forms, including photos, video and drawings, the technology is integral to a national disaster and failure studies HUBzero database that Purdue and the National Institute of Standards and Technology (NIST) are developing.

Among other things, Catlin and colleagues also have added features to meet privacy requirements for medical-related databases such as ITMIG's and those in projects, for example, to promote patient safety and best practices for operating infusion pumps in hospitals and for research into psychosocial care for pediatric HIV patients.

Catlin's group includes ITaP's Sudheera Fernando, Sumudinie Fernando, Ruchith Fernando, Ruwan Gamage, Nabeel Yoosef and Tharindu Mathew.

As demand for database capability grew, ITaP’s Michael McLennan, the chief architect of HUBzero, asked Catlin to look at developing a way for users to submit their research data and have a hub database automatically created for them. This new system, called DataStore, was released in November 2013. The ability to import spreadsheets, part of DataStore, seemed like one good way to achieve the goal of simple database creation for user communities.

“Everybody knows how to use spreadsheets,” Catlin says. “They make it easy to collect and upload research data. With a spreadsheet, it takes only a few minutes to create your own searchable, online database using DataStore.”

Spreadsheets sent as email attachments is how Purdue Professor Connie Weaver says her innovative Camp Calcium Project distributed data for 20 years. The program, begun in 1990, provides a fun and educational summer camp for children while also carefully monitoring calcium metabolism and absorption rates to inform international health standards.

When Catlin built a hub database for the Camp Calcium project, decades of data showing proper levels of calcium intake for kids became available to researchers, public health officials, nutritionists and physicians through Indiana CTSI hub, the Web home of the Indiana Clinical and Translational Sciences Institute. The database also generates graphs of the research data, provides links to publications, and offers features for searching, comparing, analyzing and exporting project data.

“This is the first time all the published data, as well as all associated articles, have been collected together in a public forum,” says Weaver, who heads Purdue's Nutrition Science Department.

Detterbeck lauded the collaboration with Catlin and the HUBzero team in building a hub for ITMIG.

“They've been partners in it,” he says. “They've thought about it as constructively as we have. It's been so much more than I ever expected.”

In addition to the hub's central role in collecting the data and making it available for analysis, Detterbeck says HUBzero's built-in features for building research communities and publishing research results are key features for ITMIG.

“It is not just the database, it's the whole platform,” he says.

More Information

New database system could aid research in science, engineering

Originally posted: July 1, 2014 4:30pm EDT