At the 2025 STAT Summt, Dr. David Jaffray, senior vice president and chief technology and digital officer at MD Anderson, joined Gideon Gil, managing editor at STAT News.Jeff Pinette

Ending cancer represents a high-stakes mission that requires urgent solutions. Through the Institute for Data Science in Oncology, The University of Texas MD Anderson Cancer Center is harnessing the full power of data science to accelerate the pace of progress.

At the 2025 STAT Summit, Dr. David Jaffray, senior vice president and chief technology and digital officer at MD Anderson, joined Gideon Gil, managing editor at STAT News, to discuss how the institute is integrating data science with research enterprises to drive transformative impacts in cancer care.


Gideon Gil: Welcome David, to the STAT Summit.

David Jaffray: Good to be here, Gideon.

Gideon Gil: At MD Anderson, your clinicians see roughly 187,000 patients with more than 400 types of cancer every year. Each individual’s care generates more than two gigabytes of data, annually. Imaging, test results, notes, treatment plans, you name it. And now you’re using that data to build an I Institute for Data Science in Oncology. Tell us about that.

David Jaffray: It’s a really important mission we have, as you’re all aware. Our objective is to make cancer history. If we could close our doors, we’d all be happy because we wouldn’t need it anymore. And I’m sure you’d be happy about that too.

But the reality is that we need to work faster to bring cancer under control. And as you all are aware, with the advances in AI and computing, data is a powerful tool to try to understand how we can better take care of our patients, improve their experience, but also drive towards new cures. And one of the things we’ve focused on substantially in the organization is: how do we manage that data? How do we steward that data? How do we capture as much as possible?

The amount of data is growing remarkably. How do we make sure we can use that to help answer important questions?

So we built a new architecture that really flows the data within the organization, and curates that and making sure it’s state-of-the-art. The applications are innumerous. Of course, we can look at experience, we can look at efficiency, we can get more patients in and out of the OR, which is important because the demand is endless.

But simultaneously, we can partner with pharmaceutical companies, for example. And we do that in a way that’s quite rigorous. We like to make sure that we’re tackling an important problem. Our clinicians get engaged, they understand the science issues, they understand the clinical issues, and then we think about how can we use those data assets to try to understand how we can move a cure forward faster?

And this for us has been central. New organizations, organizations today that need to think about how they steward their data, how they build that trust. For us, patients and trust are together. They come to us for care, and we think about how we’re going to manage their data and steward it as part of that trust architecture.

And so building this capacity has been really key. And it’s remarkable to see the engagement it drives. New, young faculty realize they can come work with us using these data extract insights and help answer questions that are important for cancer.

Gideon Gil: Can you give me an example of how you’re using the data?

David Jaffray: Our clinicians often are involved in clinical trials. They’ll see a new therapeutic advance, and they see right in front of them patients that respond and don’t respond. But the rules for the trial are set, the decision on go, no go based on the outcome.

But they see nuances, they see subtle differences. Some patients respond and some patients don’t. So hypotheses are forming in their heads about why that is the case. Simultaneously, the pharmaceutical company has already put a ton of money into getting that drug into the clinic to a certain state, maybe successful, maybe not.

And so you take those insights and you make this understanding of the investment in data and how we collect it and curate it. And that, in combination with state-of-the-art data science methods that let us simulate out possible outcomes for different trials, different cohorts of patients puts together a very powerful combination where we can identify, for example, patients with specific mutations that should have responded but didn’t respond.

Why? What other biomarker could we identify? What other genetic characteristic or proteomic marker would tell us which patients would respond and which ones wouldn’t?

And this is not just like an activity we want to do helter-skelter. We really want to think about how we industrialize this activity in partnership with our patients, because they’re all compelled — they don’t want someone else to go through the same treatment. So how do we learn from one patient so that we can treat the next ones better?

I think the most interesting part of this is this mind shift around how we manage data. Today health care data is used in a lot of different ways. We’re very particular. We really think that the stewardship role is really important, even to the point where we’re working with a little startup company that’s looking at dynamic consent technology that will allow the patients to say, “Yes, please use our data in this way because we’re compelled by the case, by the mission.”

And moving forward, I think organizations need to think about that. How they manage and steward data internally, and how do we engage with patients to make sure that they’re okay with how we’re using data to advance a cause like cancer?

Gideon Gil: When we talked a few weeks ago, you spoke about your hopes for creating digital twins for every patient. How would that work, and why are you so excited about their potential?

David Jaffray: Yeah, we have about 1.7 million visits a year at MD Anderson, and every one of those visits is loaded with some kind of decision; either the patient’s decision, the family is sometimes involved in the decision, the provider [is] involved in the decision. It’s full of decisions.

And so we ask ourselves, “How can we use the data we have, and the data we’re going to collect next? How do we use that data to help inform every decision we make?”

And we believe that computational approaches, advances in data science, including things like machine learning and AI, but other more mechanistic modeling approaches will give that decision making an edge. It doesn’t have to be 100%. It’s like house odds. If you get 2% or 3% improvements in your decision, your predictability, that compounds.

Our vision through the Institute for Data Science in Oncology, which we’ve been developing over the past couple of years, is to bring data science to every single decision we make. And it’s pervasive.

It’s everywhere from in the laboratory understanding, “Do we advance this drug or this drug?” to in a clinic, “Is this the right care for you or is this the right care for you?” Or even at this societal level, “How do we bring more data to prevention and other forms of approaches to reduce cancer?”

Let’s go back to the digital twins. Well, the digital twin paradigm is this idea that we can customize those models to each individual patient. We’re all very similar. You can buy glasses off the rack, you can buy shirts off the rack, but there’s enough variation from each of us that if we’re going to inform those decisions, we need to capture data specific to individual patients.

And then we can assemble models based on previous treatments for patients to try to understand what’s a likely trajectory of this patient given this course of care? Is that likely to be successful, or should we consider an alternative?

And increasingly, I think that this computational modeling paradigm is going to become expected. We have rich data sets, like I saw in the previous session. We know so much more about patients and that we can inform these models.

They may be very basic to start with. We have some great examples where we see prostate cancer patients, a prostate specific antigen or PSA climbing gradually through time. Wouldn’t it be great to have a computational model that suggests, “Now you should go for another test because it’s climbing” or “Based on that we’ve seen a fall in your PSA. You’re good.”

Rather than just have a series of tests, put together a computational model or a digital twin that could be used in consultation between the patient and the provider. Of course, there’s uncertainty. We’ll have to teach people about uncertainty in decision-making, but I think humans are more than capable of doing that.

Gideon Gil: Your background is in physics and radiation oncology, where you worked with algorithms to guide the administration of therapy. How does that shape your thinking about the need for thoughtful adoption of AI technologies? And looking forward, what are you most excited about?

David Jaffray: AI is pretty exciting. I’ve been working in basements of cancer centers coding since 1986, that’s almost 40 years. This is the bucking bronco of technology if there ever was one. But it just comes in all different forms and shapes, it’s quite disturbing and simultaneously very exciting.

Health care organizations aren’t built to manage computation. They’re barely built to manage tech stacks. They’re going to have thousands of AI algorithms running in your organization if you don’t already today. Who’s watching them? Who knows whether the data going in is the same as it was when it was trained?

Who’s understanding that if you had this algorithm running and you decided it was good, and you put another algorithm in front of it, is this one still doing what it’s supposed to be doing? We don’t have that skill set. The companies won’t bring you that skill set, the FDA is not going to bring you that skill set.

Organizations have to prepare themselves to stewarding thousands of algorithms, and thoughtful assembly of those algorithms so we can be confident and patients can be safe. That skill set is not in the IT teams. The physicians are very good at decision-making, they’re trained on it, but they’re not necessarily trained at managing thousands of decision-making or inference solutions within health care.

We’re looking at how do we build the maturity in organizations to be able to actually capitalize on the substantial promise that’s been presented to us from AI? And that’s a lift. We’ll have to get there though. Have to invest a little bit to get the return of AI in health care.

But the conversations are going in the right direction, people are realizing. And if you’re with health care organizations, you should be asking how do you govern your AI? We have human resources. Where is the Department of Robotic Resources that’s going to help us understand how these work together?

What am I most excited about? I’m most excited about the fact that this technology is going to accelerate cures. Super exciting. I was just with a little startup, raised $300 million in Series A. I guess that’s what I mean by “little startup” nowadays. They’re bringing robots together with AI for generating hypotheses in collaboration with humans.

And we’ll see this combination of synthetic hypotheses by computers being shared with humans and thought through, and executed with incredible efficiency through robotic methods. And that’s going to just produce a huge opportunity for very important cures that will be able to allow us to close the doors at MD Anderson. That’s exciting.

Gideon Gil: Well, thank you very much. Appreciate you being here.

David Jaffray: Thanks everybody.