Machine Learning to understand and prevent disease
Digitisation

Machine Learning to understand and prevent disease

Barbara Engelhardt moves her computational and statistics lab to Gladstone

  • By IPP Bureau | September 17, 2021

An unimaginable amount of data is continually being generated by scientific experiments, longitudinal studies, clinical trials, and hospital records—but what can be done with all this information?

Barbara Engelhardt PhD is building machine-learning models and statistical tools to make use of that data and find ways to better understand, and even prevent, disease. She is now joining Gladstone Institutes as a senior investigator.

Barbara is an innovator in computational biology," says Katie Pollard, PhD, director of the Gladstone Institute of Data Science and Biotechnology. "She brings vast expertise in statistical models and will help expand our machine-learning program. And she's a renowned graduate student mentor. We're thrilled she's joining our team."

Engelhardt is also a full professor at Princeton University, on leave this academic year. She completed her graduate studies in computer science at UC Berkeley and was a postdoctoral researcher at the University of Chicago before starting her first lab at Duke University.

"Since I first learned about Gladstone during my postdoc, it's always seemed like an oasis of amazing science," says Engelhardt. "I can't wait to start collaborating with all the scientists here."

Engelhardt's lab is not how you might picture a traditional science lab—one with cells, glass beakers, and microscopes. Instead, she runs what's called a dry lab, where her team uses powerful computers to analyze data through mathematical and computational approaches.

One of the group's focus areas is to understand how cells work together in the body. The researchers look at how cells pass information to one another, how they work as part of neighbourhoods, and how those neighbourhoods are structured. Ultimately, they are trying to understand exactly how changes within cells or their environment can lead to disease.

To do so, they work closely with biologists, geneticists, and bioengineers to obtain data from their scientific experiments, such as microscopy images and videos of cells interacting over time. Using these files, Engelhardt can examine, for instance, whether treating cells with drugs affects how cells communicate, or how to target cancer tumours as directly as possible with therapy.

"Sometimes, we ask for throwaway data or data that the scientists don't need for their studies, but from which we can still glean lots of valuable insights," explains Engelhardt. "Other times, we collaborate more closely with a team to help them build better techniques to improve their experiments."

In those cases, the process is iterative. Engelhardt's team will propose new approaches, the collaborators will try them and report back, and they'll continue to work together to find the method that can generate the best results.

Engelhardt also studies how traumatic events that occur in your life are stored in your cells, how they may affect your genome, and how this can eventually lead to disease.

"You essentially store traumatic events in your cells, like a battery," she says. "And then later in life, these traumas may lead to depression, type 2 diabetes, obesity, heart disease, or mental health problems."

Her team has been working with the Fragile Families and Child Wellbeing Study for which nearly 5,000 unmarried mothers were recruited between 1998 and 2000—a sample that includes a large number of Black, Hispanic, and low-income families. Data has been collected over the past 22 years about these children, their mothers, and, when possible, their fathers.

"Unfortunately, though perhaps not surprisingly, these kids have been through a lot," says Engelhardt. "A large number of them have incarcerated fathers, they've witnessed or been involved in crime, they've experienced bullying at school, they've gone to bed hungry, and they've been evicted from their homes."

The instability in their lives has been recorded in their cells and shows up in chemical changes to their DNA, which was collected as part of the study. Engelhardt is using all the data available about these families to understand how traumatic events get stored in their cells, to find a way to erase the records and prevent disease outcomes.

"It's challenging to work with data from a group of individuals from such diverse backgrounds, but it's critical, and it's pretty exciting that we get to do it," she says.

The third strand of research for Engelhardt's lab is to build reinforcement learning methods. This is the approach often used to guide a robot wandering through a maze or to inform "decisions" made by self-driving cars. But Engelhardt is applying this framework to electronic health care record data.

Reinforcement learning involves three categories of information. The first is a set of statistics. In the context of a hospital patient, the statistics may include the patient's age, gender, heart rate, temperature, and diagnosed disease. The second category is a set of actions, which, in this case, would be the types of interventions that health care professionals might perform, such as putting a patient on a ventilator or giving them a particular drug. Finally, there's a reward function or the objectives of the patient's care. This could be ensuring that vital signs are stable, reducing a patient's temperature, taking them off the ventilator, or getting them discharged as soon as possible.

"Given those three things—the state, action, and reward—our goal is essential to design a protocol that will lead to the best rewards," Engelhardt says. "So by building a model that can analyze all that data, we want to predict a set of actions for a patient's given state that will lead to the best outcome for their health."

Applying reinforcement learning to patient data is much more complicated than using it for robotics or self-driving cars.

"With self-driving cars, we understand the state dynamics," she explains. "We know exactly what will happen if we turn the wheel in a certain direction. But with patients, if we give them a certain drug, we don't know precisely how their state will change as a result. So, my team is finding ways we can still predict the best intervention, despite this uncertainty."

Engelhardt's group is currently collaborating with two large hospitals that have provided anonymized electronic health care record data from nearly 400,000 patients. These data include 7,000 patients who have tested positive for Covid-19 in the past year.

"Half of these patients are Black, so we're specifically building models to understand the differences in how doctors treat Black and White patients and how this may lead to different outcomes," says Engelhardt.

To do so, her team is looking at the resources spent on patients, for instance, and if this correlates to whether they die or are discharged from the hospital.

"I think we can learn overall lessons from this data too, particularly about how the hospital system can best tackle an emerging disease like Covid-19," she says. "We're hoping to build tools that will help doctors respond as quickly as possible to new diseases like Covid-19 in the future."

Upcoming E-conference

Other Related stories

Startup

Digitization