Language-conditioned world model improves policy generalization by reading environmental descriptions
Anh Nguyen, Stefan Lee

TL;DR
This paper introduces LED-WM, a language-conditioned world model that enhances policy generalization in reinforcement learning by explicitly grounding language descriptions to environment entities, without relying on planning or expert demonstrations.
Contribution
The paper presents LED-WM, a novel attention-based encoder for DreamerV3 that improves language grounding and policy generalization in unseen environments without planning or demonstrations.
Findings
LED-WM outperforms baselines in generalizing to unseen games.
Policies can be improved via fine-tuning on synthetic trajectories.
Explicit language grounding enhances environment understanding.
Abstract
To interact effectively with humans in the real world, it is important for agents to understand language that describes the dynamics of the environment--that is, how the environment behaves--rather than just task instructions specifying "what to do". Understanding this dynamics-descriptive language is important for human-agent interaction and agent behavior. Recent work address this problem using a model-based approach: language is incorporated into a world model, which is then used to learn a behavior policy. However, these existing methods either do not demonstrate policy generalization to unseen games or rely on limiting assumptions. For instance, assuming that the latency induced by inference-time planning is tolerable for the target task or expert demonstrations are available. Expanding on this line of research, we focus on improving policy generalization from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Social Robot Interaction and HRI
