TL;DR
RIZE introduces an adaptive regularization approach in inverse reinforcement learning that dynamically adjusts reward bounds and incorporates distributional RL, leading to superior performance in complex environments with limited demonstrations.
Contribution
It presents a novel IRL method combining adaptive TD regularization and distributional RL, enhancing reward flexibility and robustness over existing techniques.
Findings
Achieves expert-level performance on MuJoCo and Adroit environments.
Surpasses baseline methods on Humanoid-v2 with limited demonstrations.
Validates effectiveness through extensive experiments and ablation studies.
Abstract
We propose a novel Inverse Reinforcement Learning (IRL) method that mitigates the rigidity of fixed reward structures and the limited flexibility of implicit reward regularization. Building on the Maximum Entropy IRL framework, our approach incorporates a squared temporal-difference (TD) regularizer with adaptive targets that evolve dynamically during training, thereby imposing adaptive bounds on recovered rewards and promoting robust decision-making. To capture richer return information, we integrate distributional RL into the learning process. Empirically, our method achieves expert-level performance on complex MuJoCo and Adroit environments, surpassing baseline methods on the Humanoid-v2 task with limited expert demonstrations. Extensive experiments and ablation studies further validate the effectiveness of the approach and provide insights into reward dynamics in imitation learning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
