Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
Josh Abramson, Arun Ahuja, Federico Carnevale, Petko Georgiev, Alex, Goldin, Alden Hung, Jessica Landon, Jirka Lhotka, Timothy Lillicrap, Alistair, Muldal, George Powell, Adam Santoro, Guy Scully, Sanjana Srivastava, Tamara, von Glehn, Greg Wayne, Nathaniel Wong, Chen Yan

TL;DR
This paper demonstrates how reinforcement learning from human feedback (RLHF) can enhance the behavior of simulated embodied agents by leveraging human judgments to create effective reward models, improving interaction quality.
Contribution
The study introduces a novel 'Inter-temporal Bradley-Terry' (IBT) method to model human judgments and applies RLHF to improve agent performance in complex, embodied environments.
Findings
Agents trained with IBT-based rewards outperform baselines.
Human judgments effectively guide reinforcement learning in embodied domains.
Improved agent behavior aligns better with human preferences.
Abstract
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
MethodsBalanced Selection
