IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

Stefano Viel; Luca Viano; Volkan Cevher

arXiv:2502.19859·cs.LG·June 2, 2025

IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic

Stefano Viel, Luca Viano, Volkan Cevher

PDF

Open Access

TL;DR

The paper presents the SOAR framework for imitation learning, which enhances policy learning by incorporating multiple critics for uncertainty estimation and optimism, leading to improved performance and sample efficiency.

Contribution

Introduces the SOAR framework that integrates multiple critics and optimism into imitation learning, providing theoretical guarantees and practical improvements over existing methods.

Findings

01

SOAR reduces the number of episodes needed to achieve target performance by half.

02

Boosts performance of Soft Actor Critic-based imitation learning algorithms in MuJoCo environments.

03

Provides a provable algorithm with guarantees in the tabular setting.

Abstract

This paper introduces the SOAR framework for imitation learning. SOAR is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. Within the policy updates, the SOAR framework uses an actor critic method with multiple critics to estimate the critic uncertainty and build an optimistic critic fundamental to drive exploration. When instantiated in the tabular setting, we get a provable algorithm with guarantees that matches the best known results in $ϵ$ . Practically, the SOAR template is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments. Overall, thanks to SOAR, the required number of episodes to achieve the same performance is reduced by half.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Experience Replay · Adam · Soft Actor Critic