TL;DR
Masked IRL leverages large language models to combine demonstrations and language instructions, improving reward learning by disambiguating and focusing on relevant task aspects, leading to better generalization and efficiency.
Contribution
The paper introduces Masked IRL, a novel framework that uses LLMs to infer relevance masks and clarify ambiguous instructions, enhancing reward learning from limited data.
Findings
Outperforms prior language-conditioned IRL methods by up to 15%.
Uses up to 4.7 times less data for effective learning.
Improves sample-efficiency, generalization, and robustness to ambiguous language.
Abstract
Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. This happens because demonstrations show robots how to do a task but not what matters for that task, causing the model to focus on irrelevant state details. Natural language can more directly specify what the robot should focus on, and, in principle, disambiguate between many reward functions consistent with the demonstrations. However, existing language-conditioned reward learning methods typically treat instructions as simple conditioning signals, without fully exploiting their potential to resolve ambiguity. Moreover, real instructions are often ambiguous themselves, so naive conditioning is unreliable. Our key insight is that these two input types carry complementary information: demonstrations show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
