Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Tian Xu; Chenyang Wang; Xiaochen Zhai; Ziniu Li; Yi-Chen Li; Yang Yu

arXiv:2603.22713·cs.LG·March 25, 2026

Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Yang Yu

PDF

Open Access

TL;DR

This paper introduces Dual Q-DM, a non-adversarial imitation learning method that uses Bellman constraints to propagate high Q-values, effectively reducing compounding errors and outperforming prior approaches like IQ-Learn and behavioral cloning.

Contribution

We propose Dual Q-DM, a novel non-adversarial imitation learning algorithm that incorporates Bellman constraints to improve generalization and eliminate compounding errors, with theoretical guarantees.

Findings

01

Dual Q-DM provably recovers expert actions beyond demonstrations.

02

Dual Q-DM is equivalent to adversarial imitation learning in theory.

03

Experimental results confirm reduced compounding errors and improved imitation quality.

Abstract

Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to BC and suffers from an imitation gap lower bound with quadratic dependence on horizon, therefore still suffering from compounding errors. Theoretical analysis reveals that, despite using online interactions, IQ-Learn uniformly suppresses the Q-values for all actions on states uncovered by demonstrations, thereby failing to generalize. To address this limitation, we introduce a primal-dual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning