Making Universal Policies Universal
Niklas H\"opner, David Kuric, Herke van Hoof

TL;DR
This paper introduces a universal policy framework for multi-agent sequential decision tasks, leveraging a diffusion-based planner and inverse dynamics model to enable positive transfer and generalization across diverse agents.
Contribution
It proposes a novel training method for a universal policy that pools data from multiple agents, improving transferability and generalization in complex environments.
Findings
Achieved up to 42.20% improvement in task accuracy over single-agent training.
Demonstrated positive transfer across different agents in BabyAI environment.
Showed the planner's ability to generalize to unseen agents.
Abstract
The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent's unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Robot Manipulation and Learning
