Causal K-Means Clustering

Kwangho Kim; Jisu Kim; Edward H. Kennedy

arXiv:2405.03083·stat.ME·April 7, 2026

Causal K-Means Clustering

Kwangho Kim, Jisu Kim, Edward H. Kennedy

PDF

TL;DR

This paper introduces Causal k-Means Clustering, a novel method to identify unknown subgroup structures in causal effect studies using clustering algorithms tailored for counterfactual functions.

Contribution

It develops a simple plug-in estimator and a bias-corrected estimator based on nonparametric efficiency theory, enabling effective subgroup detection in causal analysis.

Findings

01

The plug-in estimator is easy to implement with off-the-shelf algorithms.

02

The bias-corrected estimator achieves fast root-n convergence and asymptotic normality.

03

Simulations and a real study demonstrate the method's practical utility.

Abstract

Causal effects are often characterized with population summaries. These might provide an incomplete picture when there are heterogeneous treatment effects across subgroups. Since the subgroup structure is typically unknown, it is more challenging to identify and evaluate subgroup effects than population effects. We propose a new solution to this problem: \emph{Causal k-Means Clustering}, which leverages the k-means clustering algorithm to uncover the unknown subgroup structure. Our problem differs significantly from the conventional clustering setup since the variables to be clustered are unknown counterfactual functions. We present a plug-in estimator which is simple and readily implementable using off-the-shelf algorithms, and study its rate of convergence. We also develop a new bias-corrected estimator based on nonparametric efficiency theory and double machine learning, and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.