Causal Imitation Learning under Expert-Observable and Expert-Unobservable Confounding

Daqian Shao; Thomas Kleine Buening; Marta Kwiatkowska

arXiv:2502.07656·cs.LG·February 2, 2026

Causal Imitation Learning under Expert-Observable and Expert-Unobservable Confounding

Daqian Shao, Thomas Kleine Buening, Marta Kwiatkowska

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a causal imitation learning framework that handles hidden confounders using trajectory histories as instruments, and presents an algorithm that outperforms existing methods in continuous environments.

Contribution

It develops a unified causal IL framework with hidden confounders and proposes DML-IL, an instrumental variable regression-based algorithm, with theoretical bounds and empirical success.

Findings

01

DML-IL outperforms existing causal IL baselines in Mujoco tasks.

02

The framework effectively handles two types of hidden confounders.

03

The approach reformulates causal IL as a Conditional Moment Restriction problem.

Abstract

We propose a general framework for causal Imitation Learning (IL) with hidden confounders, which subsumes several existing settings. Our framework accounts for two types of hidden confounders: (a) variables observed by the expert but not by the imitator, and (b) confounding noise hidden from both. By leveraging trajectory histories as instruments, we reformulate causal IL in our framework into a Conditional Moment Restriction (CMR) problem. We propose DML-IL, an algorithm that solves this CMR problem via instrumental variable regression, and upper bound its imitation gap. Empirical evaluation on continuous state-action environments, including Mujoco tasks, demonstrates that DML-IL outperforms existing causal IL baselines.

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

- Casting imitation under confounding as a CMR with trajectories as instruments is conceptually simple and lets the authors leverage existing IV/CMR machinery. - The paper shows corollaries recovering Swamy et al. results as special cases.

Weaknesses

A confusing point of this paper is the arrow from $u_\epsilon$ to $a$, which means that even the expert does not have full control over its actions--the expert might choose to take one action, but another action take place in the end because of $u_\epsilon$. I did not get how this is justified. --- >"We assume the confounding noise is additive to the action, which is standard in causal inference. Without this assumption, the causal effect becomes unidentifiable." I do not think that this is re

Reviewer 02Rating 6Confidence 3

Strengths

- Originality: The distinction between expert-observable and expert-unobservable confounders provides a novel and more realistic problem formulation that generalizes several prior works. The neat reframing of causal IL into instrumental variables and CMR problems is elegant and theoretically well-motivated. - Quality: The theoretical analysis is solid, with the neat theory on the imitation gap bound (Theorem 4.5) that recovers prior results as special cases (Corollaries 4.6 and 4.7). - Clarity:

Weaknesses

- Limited scope of unification: The framework doesn't truly subsume Vuorio et al. (2022), which solves the problem through environment interaction. The experimental results don't include comparisons to interactive scenarios, limiting the claim of being a "unifying" framework. The paper focuses exclusively on offline IL from fixed demonstrations. - Gap between theory and experiments: No analysis is provided comparing experimental results to the theoretical bounds from Theorem 4.5, making it uncle

Reviewer 03Rating 6Confidence 3

Strengths

The paper is clearly structured and presents a compelling case for a more holistic approach to addressing hidden confounding in imitation learning. The proposed framework elegantly subsumes several prior settings, and the experimental results convincingly validate the method’s effectiveness across diverse environments. The theoretical analysis is rigorous and meaningfully connects to established results in causal inference and econometrics.

Weaknesses

- The framework relies on two key assumptions that may limit its applicability in certain domains. First, the additive noise assumption (i.e., that unobservable confounders affect actions additively) is essential for identifiability but may be overly restrictive in complex systems where interactions are nonlinear or multiplicative (e.g., biological or financial systems). While this assumption is standard in instrumental variable literature, its validity must be carefully assessed per application

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · AI-based Problem Solving and Planning · Imbalanced Data Classification Techniques

MethodsSparse Evolutionary Training