TL;DR
QuickLAP is a Bayesian framework that combines language and physical feedback to enable real-time, robust reward learning for semi-autonomous agents, significantly improving interpretability and collaboration.
Contribution
It introduces a novel probabilistic method that fuses language and physical cues using LLMs for fast reward inference in real-time settings.
Findings
Reduces reward learning error by over 70% in simulations.
Participants found QuickLAP more understandable and preferred its behavior.
Code is publicly available at https://github.com/MIT-CLEAR-Lab/QuickLAP.
Abstract
Robots must learn from both what people do and what they say, but either modality alone is often incomplete: physical corrections are grounded but ambiguous in intent, while language expresses high-level goals but lacks physical grounding. We introduce QuickLAP: Quick Language-Action Preference learning, a Bayesian framework that fuses physical and language feedback to infer reward functions in real time. Our key insight is to treat language as a probabilistic observation over the user's latent preferences, clarifying which reward features matter and how physical corrections should be interpreted. QuickLAP uses Large Language Models (LLMs) to extract reward feature attention masks and preference shifts from free-form utterances, which it integrates with physical feedback in a closed-form update rule. This enables fast, real-time, and robust reward learning that handles ambiguous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
