Causal Imitation Learning under Temporally Correlated Noise
Gokul Swamy, Sanjiban Choudhury, J. Andrew Bagnell, Zhiwei Steven Wu

TL;DR
This paper introduces two novel algorithms for imitation learning from corrupted data affected by temporally correlated noise, leveraging instrumental variable regression to recover accurate policies without interactive expert access.
Contribution
It applies econometric IVR techniques to imitation learning, proposing DoubIL and ResiduIL algorithms for offline and simulator-based settings, improving robustness to noise.
Findings
Both algorithms outperform behavioral cloning on simulated tasks.
ResiduIL and DoubIL effectively mitigate effects of temporally correlated noise.
Algorithms do not require interactive expert demonstrations.
Abstract
We develop algorithms for imitation learning from policy data that was corrupted by temporally correlated noise in expert actions. When noise affects multiple timesteps of recorded data, it can manifest as spurious correlations between states and actions that a learner might latch on to, leading to poor policy performance. To break up these spurious correlations, we apply modern variants of the instrumental variable regression (IVR) technique of econometrics, enabling us to recover the underlying policy without requiring access to an interactive expert. In particular, we present two techniques, one of a generative-modeling flavor (DoubIL) that can utilize access to a simulator, and one of a game-theoretic flavor (ResiduIL) that can be run entirely offline. We find both of our algorithms compare favorably to behavioral cloning on simulated control tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
