IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic
Stefano Viel, Luca Viano, Volkan Cevher

TL;DR
The paper presents the SOAR framework for imitation learning, which enhances policy learning by incorporating multiple critics for uncertainty estimation and optimism, leading to improved performance and sample efficiency.
Contribution
Introduces the SOAR framework that integrates multiple critics and optimism into imitation learning, providing theoretical guarantees and practical improvements over existing methods.
Findings
SOAR reduces the number of episodes needed to achieve target performance by half.
Boosts performance of Soft Actor Critic-based imitation learning algorithms in MuJoCo environments.
Provides a provable algorithm with guarantees in the tabular setting.
Abstract
This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. Within the policy updates, the SOAR framework uses an actor critic method with multiple critics to estimate the critic uncertainty and build an optimistic critic fundamental to drive exploration. When instantiated in the tabular setting, we get a provable algorithm with guarantees that matches the best known results in . Practically, the SOAR template is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments. Overall, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by half.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Experience Replay · Adam · Soft Actor Critic
