Overview and practical recommendations on using Shapley Values for   identifying predictive biomarkers via CATE modeling

David Svensson; Erik Hermansson; Nikolaos Nikolaou; Konstantinos; Sechidis; Ilya Lipkovich

arXiv:2505.01145·stat.ME·May 5, 2025

Overview and practical recommendations on using Shapley Values for identifying predictive biomarkers via CATE modeling

David Svensson, Erik Hermansson, Nikolaos Nikolaou, Konstantinos, Sechidis, Ilya Lipkovich

PDF

Open Access

TL;DR

This paper explores how Shapley Values can be effectively used to identify predictive biomarkers through CATE modeling, addressing computational challenges and benchmarking different meta-learner strategies.

Contribution

It introduces a surrogate estimation method for SHAP in multi-stage CATE models that reduces computational load and is strategy-agnostic.

Findings

01

Surrogate approach improves computational efficiency

02

Benchmarking shows effective biomarker identification

03

Method applicable across various CATE meta-learners

Abstract

In recent years, two parallel research trends have emerged in machine learning, yet their intersections remain largely unexplored. On one hand, there has been a significant increase in literature focused on Individual Treatment Effect (ITE) modeling, particularly targeting the Conditional Average Treatment Effect (CATE) using meta-learner techniques. These approaches often aim to identify causal effects from observational data. On the other hand, the field of Explainable Machine Learning (XML) has gained traction, with various approaches developed to explain complex models and make their predictions more interpretable. A prominent technique in this area is Shapley Additive Explanations (SHAP), which has become mainstream in data science for analyzing supervised learning models. However, there has been limited exploration of SHAP application in identifying predictive biomarkers through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials

MethodsShapley Additive Explanations