# Assessing surrogate heterogeneity in real world data using meta-learners

**Authors:** Rebecca Knowlton, Layla Parast

PMC · DOI: 10.1515/jci-2025-0033 · 2026-02-23

## TL;DR

This paper introduces a framework to assess surrogate marker heterogeneity in non-randomized data using machine learning methods.

## Contribution

The novel framework allows evaluating surrogate heterogeneity in observational data while accounting for confounding.

## Key findings

- The framework identifies covariate profiles where a surrogate is valid for the primary outcome.
- Simulation studies and real-world application demonstrate the framework's effectiveness.
- Hemoglobin A1c's surrogacy for fasting plasma glucose is examined for heterogeneity.

## Abstract

Surrogate markers are most commonly studied within the context of randomized clinical trials. However, the need for alternative outcomes also extends to real-world public health and social science research, where randomized trials are often impractical. While standard methods for evaluating surrogate markers largely rely on the assumption of randomized treatment, there is a significant gap in applying these techniques to observational data, where the central challenge shifts to managing confounding. The few methods that do allow for non-randomized treatment/exposure do not offer a way to examine surrogate heterogeneity with respect to patient characteristics. In this paper, we propose a framework to assess surrogate heterogeneity in non-randomized data and implement this framework using meta-learners. Our approach allows us to quantify heterogeneity in surrogate strength with respect to patient characteristics while accommodating confounders through the use of flexible, off-the-shelf machine learning methods. In addition, we use our framework to identify covariate profiles where the surrogate is a valid replacement of the primary outcome. We examine the performance of our methods via a simulation study and application to examine heterogeneity in the surrogacy of hemoglobin A1c as a surrogate for fasting plasma glucose.

## Full-text entities

- **Diseases:** cancer (MESH:D009369), obese (MESH:D009765), AIDS (MESH:D000163)
- **Chemicals:** S (MESH:D013455), S (0)lS (1) (-), glucose (MESH:D005947), N (MESH:D009584), cholesterol (MESH:D002784)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12924684/full.md

---
Source: https://tomesphere.com/paper/PMC12924684