Purdue researcher leverages AI in the fight against cancer
Scientists from the Purdue Institute for Cancer Research (PICR) are leveraging the world-class high-performance computing (HPC) resources provided by the Rosen Center for Advanced Computing (RCAC) to conduct bioinformatics research on cancer biology. The team is focusing on developing a suite of artificial intelligence (AI)-driven models that will have profound implications for cancer research and precision medicine.
Nadia Lanman
is a research associate professor at PICR and manager of the Collaborative Core for Cancer Bioinformatics (C3B). She is also a computational biologist, applying her expertise in HPC and big data analysis to the fight against cancer. Using her unique skillset, Lanman and her collaborator, Ananth Grama, a distinguished professor of Computer Science and Director of Purdue’s Institute for Physical AI, are working to develop technologies that can predict individual responses to certain cancers, as well as point to the optimal treatment strategy for that person.
“The Purdue Institute for Cancer Research is a National Cancer Institute-designated Basic Laboratory Cancer Center,” says Lanman. “That means we do a lot of basic research. So we're the ones who are making the discoveries that then can later be translated or harnessed to develop newer treatments, screening methods, tools for surgeons to use et cetera.”
Lanman continues, “Personally, I'm really interested in disease progression, how disease progression occurs, and how we can harness what we learn to try to develop novel treatments that either halt the progression or slow the progression.”
Lanman has big aspirations for the future of oncology therapeutics. Her ultimate goal is the creation of digital twin models for individual cancer patients. To that end, Lanman and the C3B staff bioinformaticists have been developing numerous AI-driven models that will collectively achieve this goal. The team’s primary focus is urological cancer, though they do work on other types if there are high quality datasets available. Currently they are researching bladder, prostate, and lung cancers, as well as a condition known as benign prostatic hyperplasia. The biggest project—the one working towards having a digital twin for individual patients—involves bladder cancer.
“So the majority of bladder cancer deaths are from stage 4 metastatic bladder cancer,” says Lanman. “Currently, bladder cancer patients are all treated largely the same, which means we are likely overtreating some patients and potentially undertreating others. This is a problem because these treatments (combination chemotherapy or bladder cystectomies) have massive consequences for the patients. With an individualized digital twin, we could see which patients need aggressive treatment and which would benefit more from a less aggressive course. Also, using big data to nail down the mechanisms of why some patients develop metastatic disease and others do not could help us determine more effective future treatments or novel therapies.”
To help with this problem, Lanman and her team have developed a model that can predict which patients will go on to develop metastatic disease using single-cell RNA sequencing data. This is a major step for precision medicine. By knowing who is likely to have more severe disease progression, doctors would know which patients need more aggressive treatments and earlier intervention.
Another recent
AI-model developed by Lanman, Grama, and their teams is called GeneFlow. GeneFlow is a spatial transcriptomic gene expression technology. Spatial transcriptomics is the study of the RNA transcripts produced by the genome while preserving the positional context within the cell. Spatial transcriptomics has vastly deepened researchers’ understanding of gene expression within intact tissues, providing unparalleled insight into cell biology and disease mechanisms. Most research into machine learning (ML) approaches for spatial transcriptomics has followed a particular strategy: take histological imaging of a cell and use ML methods to predict or infer the gene expression. GeneFlow is revolutionary in that it tackles the same problem in reverse, using AI to generate realistic histopathological images from the existing gene expression data. This new generative AI model has immense potential to accelerate diagnosis and treatment in cancer patients.
“GeneFlow is a generative model that can take single- or multi-cell gene expression data and generate the associated morphological features,” says Lanman. “Basically, we’re going from RNA sequencing data to histological image data. And that is really important. There’s currently a wealth of data online where we only have this gene expression data—not the spatial context that the cells are in—and getting that spatial context through traditional spatial transcriptomics methods costs an arm and a leg. For example, I have a separate project where sequencing only 16 samples is going to cost over $70,000. So if we can get to a place where we don’t necessarily need to do that and can learn spatial context from the gene expression data, that would be really powerful.”

Lanman and her group, together with collaborators such as Ananth Grama, have produced other AI-driven models that each lend themselves to research for cancer biology. AnnotateAnyCell is an open-source semi-supervised deep learning framework that aims to reduce the time and effort required for cellular analysis in tissue samples by leveraging AI to analyze and annotate whole slide images in digital pathology. PertFlow predicts the effects of drug intervention on both gene expression and cellular morphology simultaneously, a feat that had previously only been able to be completed in isolation. Pert2Mol uses generative AI for drug discovery, accounting for transcriptomic data and morphological features and translating complex cellular responses into molecular drug designs. The list goes on and on, and is as impressive as it is extensive. ENBED (Ensemble Nucleotide Byte-level Encoder-Decoder) is a foundation model developed through a collaboration led by Dr. Vaneet Aggarwal, a Purdue Institute for Cancer Research member and Professor of Industrial Engineering, and his student, Dr. Aditya Malusare; the model analyzes nucleotide sequences and is particularly effective at predicting genomic mutations.
“We're developing models at different scales,” says Lanman. “So from the molecular to the cellular to the tissue level, all the way up to the individual level. And so really all of these different models are going to hopefully lead us to be able to develop that digital twin.”
As a computational biologist, one thing Lanman relies on for conducting her research is access to HPC resources, and for that, she turns to RCAC. Lanman is a veteran RCAC user. Throughout the years, she has utilized numerous campus systems: Snyder, Halstead, Bell, Rossmann, Negishi, Gilbreth, Gautschi, and even Anvil, the NSF-funded national supercomputer. Currently, Lanman and her group use Bell and Negishi for less intensive work, Rossmann for projects with heightened data security requirements, and Gilbreth for lighter-weight AI-based workflows. Gautschi, Purdue’s newest and most powerful supercomputer to date, was designed specifically to enable world-class AI research, and is what Lanman uses for the brunt of her work.
“We love Gautschi,” says Lanman. “We love it. We're going through our GPU hours really fast, but we've also gotten a lot of manuscripts published and submitted using it, so that’s great.”
Another crucial aspect of RCAC’s services is the expertise provided by the support team. Lanman and the C3B team have taken advantage of the AI and Life Sciences trainings offered by RCAC. The group also works closely with Arun Seethram, RCAC’s in-house bioinformatics expert.
“Arun helps us an immense amount, especially with package installations,” says Lanman. “It's nice to have the support to do things like that for the C3B team because otherwise, each of our employees would have to spend time individually installing packages, which would really slow down our progress. He has made a phenomenal impact.”
The collaboration between RCAC and Lanman is a shining example of the excellence that can be achieved through the confluence of Purdue’s Strategic Initiatives—in this instance, Purdue Computes and One Health. By investing in state-of-the-art facilities, cutting-edge resources, and world-class expertise, the university is fostering an environment where research can thrive, leading to scientific discoveries that have real-world impact.
To learn more about Purdue’s Strategic Initiatives and how the university is striving to build a better world, please visit: Strategic Initiatives
The Purdue Institute for Cancer Research (PICR) is committed to improving lives through pioneering scientific discoveries that deepen the world’s understanding of cancer. The institute advances cutting-edge technologies and medicines to detect, prevent, and treat cancer while empowering tomorrow’s cancer research workforce. PICR is part of Purdue’s One Health Initiative, which integrates cancer research into a broader effort to understand the shared drivers of human, animal and plant health.
The Gautschi cluster was built through a partnership with Dell, AMD, DDN, and Nvidia, thanks to support from Purdue Computes and the Institute for Physical AI (IPAI). To purchase access to the Gautschi Community Cluster today, please visit RCAC’s Cluster Access Purchase page. IPAI is currently offering a matching program for Gautschi-AI, with IPAI matching a one-year allocation of one GPU for each GPU purchased (up to 8 GPUs). To take advantage of the matching program, all researchers need to do is provide a written description of their project and how it relates to physical AI alongside their purchase order.
RCAC operates the centrally-maintained research computing resources at Purdue University, providing access to leading-edge computational and data storage systems as well as expertise and support to Purdue faculty, staff, and student researchers. To learn more about HPC and how RCAC can help you, please visit: https://www.rcac.purdue.edu/
Written by: Jonathan Poole, poole43@purdue.edu