RILe: Reinforced Imitation Learning

Mert Albaba; Sammy Christen; Thomas Langarek; Christoph Gebhardt,; Otmar Hilliges; Michael J. Black

arXiv:2406.08472·cs.LG·April 22, 2025

RILe: Reinforced Imitation Learning

Mert Albaba, Sammy Christen, Thomas Langarek, Christoph Gebhardt,, Otmar Hilliges, Michael J. Black

PDF

Open Access 3 Reviews

TL;DR

RILe is a novel reinforcement learning framework that combines imitation and inverse reinforcement learning to efficiently learn dense reward functions, enabling high-performance policies in complex high-dimensional tasks.

Contribution

RILe introduces a trainer-student framework that adaptively learns reward functions to improve imitation learning in high-dimensional environments.

Findings

01

Outperforms existing methods in robotic locomotion tasks

02

Achieves near-expert performance in complex high-dimensional environments

03

Provides a dynamic reward signal that enhances learning efficiency

Abstract

Acquiring complex behaviors is essential for artificially intelligent agents, yet learning these behaviors in high-dimensional settings poses a significant challenge due to the vast search space. Traditional reinforcement learning (RL) requires extensive manual effort for reward function engineering. Inverse reinforcement learning (IRL) uncovers reward functions from expert demonstrations but relies on an iterative process that is often computationally expensive. Imitation learning (IL) provides a more efficient alternative by directly comparing an agent's actions to expert demonstrations; however, in high-dimensional environments, such direct comparisons often offer insufficient feedback for effective learning. We introduce RILe (Reinforced Imitation Learning), a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

Clear, appealing reframing: optimizing a teacher policy instead of using a static discriminator reward. Practical promise: dynamic reward shaping within a single loop; results are competitive/strong on high-dimensional control. Modular: can plug in stronger discriminators and, in principle, other evaluators

Weaknesses

Baseline oddity: In several tasks GAIL is the strongest/near-strongest baseline. For a 2016 method, this is atypical and raises concerns about coverage (missing stronger recent IL/IRL/offline-IL variants) and tuning fairness (architectures, budgets, entropy, normalization, replay). Sparsity not fundamentally addressed: The trainer still derives its teaching signal from the discriminator; when the student is far from expert support, feedback can remain sparse/off-manifold. The method looks like

Reviewer 02Rating 6Confidence 5

Strengths

1. The core idea of using an RL agent to learn a reward-generating policy is highly original and represents a paradigm shift from existing methods. 2. The paper is backed by an extensive set of experiments covering ablation studies, computational analysis, fairness comparisons, and performance evaluations on diverse benchmarks. The results are consistently strong. 3. By achieving high performance with significantly better computational efficiency than IRL methods, RILe offers a more prac

Weaknesses

1. The paper acknowledges that training the three-component system (student, trainer, discriminator) introduces stability challenges. While strategies like freezing the trainer are mentioned (and detailed in Appendix B), the main text lacks a discussion on how sensitive the method is to hyperparameters related to this stability (e.g., the frequency of freezing). A more quantitative analysis of this sensitivity would be helpful. 2. The comparison is excellent but could be even more compelling

Reviewer 03Rating 4Confidence 4

Strengths

- RILE leads to small but broad improvement across the board in Mujoco+ - In addition to the positive core result, there are a good number of experiments exploring the properties of various components of RILE (e.g. the reward function comparison (Fig 4), the impact of the function transform for the trainer agent's reward (Sec 5.1), the comparison of RILE to diffusion-based discriminators (Sec 5.5), and the study of the impact of noise). - RILE leads to significantly improved robustness to noise

Weaknesses

- **The existing results in the paper do not fully convince me that the method is sound.** The main result (Fig 4) presents relatively small empirical improvements in Mujoco, that in my view, do not justify the significant increase in algorithm complexity that RILE presents. Unstable multi-agent dynamics are already a challenge that adversarial imitation learning methods must cope with without introducing a third agent. The fact that various empirical tricks (e.g. freezing the trainer, tuning th

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Multimodal Machine Learning Applications