Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints
Tian Xu, Chenyang Wang, Xiaochen Zhai, Ziniu Li, Yi-Chen Li, Yang Yu

TL;DR
This paper introduces Dual Q-DM, a non-adversarial imitation learning method that uses Bellman constraints to propagate high Q-values, effectively reducing compounding errors and outperforming prior approaches like IQ-Learn and behavioral cloning.
Contribution
We propose Dual Q-DM, a novel non-adversarial imitation learning algorithm that incorporates Bellman constraints to improve generalization and eliminate compounding errors, with theoretical guarantees.
Findings
Dual Q-DM provably recovers expert actions beyond demonstrations.
Dual Q-DM is equivalent to adversarial imitation learning in theory.
Experimental results confirm reduced compounding errors and improved imitation quality.
Abstract
Adversarial imitation learning (AIL) achieves high-quality imitation by mitigating compounding errors in behavioral cloning (BC), but often exhibits training instability due to adversarial optimization. To avoid this issue, a class of non-adversarial Q-based imitation learning (IL) methods, represented by IQ-Learn, has emerged and is widely believed to outperform BC by leveraging online environment interactions. However, this paper revisits IQ-Learn and demonstrates that it provably reduces to BC and suffers from an imitation gap lower bound with quadratic dependence on horizon, therefore still suffering from compounding errors. Theoretical analysis reveals that, despite using online interactions, IQ-Learn uniformly suppresses the Q-values for all actions on states uncovered by demonstrations, thereby failing to generalize. To address this limitation, we introduce a primal-dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
