RILe: Reinforced Imitation Learning
Mert Albaba, Sammy Christen, Thomas Langarek, Christoph Gebhardt,, Otmar Hilliges, Michael J. Black

TL;DR
RILe is a novel reinforcement learning framework that combines imitation and inverse reinforcement learning to efficiently learn dense reward functions, enabling high-performance policies in complex high-dimensional tasks.
Contribution
RILe introduces a trainer-student framework that adaptively learns reward functions to improve imitation learning in high-dimensional environments.
Findings
Outperforms existing methods in robotic locomotion tasks
Achieves near-expert performance in complex high-dimensional environments
Provides a dynamic reward signal that enhances learning efficiency
Abstract
Acquiring complex behaviors is essential for artificially intelligent agents, yet learning these behaviors in high-dimensional settings poses a significant challenge due to the vast search space. Traditional reinforcement learning (RL) requires extensive manual effort for reward function engineering. Inverse reinforcement learning (IRL) uncovers reward functions from expert demonstrations but relies on an iterative process that is often computationally expensive. Imitation learning (IL) provides a more efficient alternative by directly comparing an agent's actions to expert demonstrations; however, in high-dimensional environments, such direct comparisons often offer insufficient feedback for effective learning. We introduce RILe (Reinforced Imitation Learning), a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward…
Peer Reviews
Decision·Submitted to ICLR 2026
Clear, appealing reframing: optimizing a teacher policy instead of using a static discriminator reward. Practical promise: dynamic reward shaping within a single loop; results are competitive/strong on high-dimensional control. Modular: can plug in stronger discriminators and, in principle, other evaluators
Baseline oddity: In several tasks GAIL is the strongest/near-strongest baseline. For a 2016 method, this is atypical and raises concerns about coverage (missing stronger recent IL/IRL/offline-IL variants) and tuning fairness (architectures, budgets, entropy, normalization, replay). Sparsity not fundamentally addressed: The trainer still derives its teaching signal from the discriminator; when the student is far from expert support, feedback can remain sparse/off-manifold. The method looks like
1. The core idea of using an RL agent to learn a reward-generating policy is highly original and represents a paradigm shift from existing methods. 2. The paper is backed by an extensive set of experiments covering ablation studies, computational analysis, fairness comparisons, and performance evaluations on diverse benchmarks. The results are consistently strong. 3. By achieving high performance with significantly better computational efficiency than IRL methods, RILe offers a more prac
1. The paper acknowledges that training the three-component system (student, trainer, discriminator) introduces stability challenges. While strategies like freezing the trainer are mentioned (and detailed in Appendix B), the main text lacks a discussion on how sensitive the method is to hyperparameters related to this stability (e.g., the frequency of freezing). A more quantitative analysis of this sensitivity would be helpful. 2. The comparison is excellent but could be even more compelling
- RILE leads to small but broad improvement across the board in Mujoco+ - In addition to the positive core result, there are a good number of experiments exploring the properties of various components of RILE (e.g. the reward function comparison (Fig 4), the impact of the function transform for the trainer agent's reward (Sec 5.1), the comparison of RILE to diffusion-based discriminators (Sec 5.5), and the study of the impact of noise). - RILE leads to significantly improved robustness to noise
- **The existing results in the paper do not fully convince me that the method is sound.** The main result (Fig 4) presents relatively small empirical improvements in Mujoco, that in my view, do not justify the significant increase in algorithm complexity that RILE presents. Unstable multi-agent dynamics are already a challenge that adversarial imitation learning methods must cope with without introducing a third agent. The fact that various empirical tricks (e.g. freezing the trainer, tuning th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Multimodal Machine Learning Applications
