Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments
Jinwoo Jang, Minjong Yoo, Sihyung Yoon, Honguk Woo

TL;DR
This paper introduces TMoW, a test-time adaptable mixture of world models for embodied agents, enabling better performance in dynamic environments through continual adaptation and few-shot learning.
Contribution
It extends the MoE paradigm by allowing test-time updates and prototype-based routing, significantly improving adaptability of embodied agents in unseen and evolving domains.
Findings
Strong zero-shot adaptation performance
Effective few-shot model expansion
Enhanced operation in dynamic environments
Abstract
Language model (LM)-based embodied agents are increasingly deployed in real-world settings. Yet, their adaptability remains limited in dynamic environments, where constructing accurate and flexible world models is crucial for effective reasoning and decision-making. To address this challenge, we extend the Mixture-of-Experts (MoE) paradigm to embodied agents. While conventional MoE architectures modularize knowledge into expert components with pre-trained routing, they remain rigid once deployed, making them less effective for adapting to unseen domains in dynamic environments. We therefore propose Test-time Mixture of World Models (TMoW), a framework that enhances adaptability to unseen and evolving domains. TMoW updates its routing function over world models at test time, unlike conventional MoE where the function remains fixed, enabling agents to recombine existing models and…
Peer Reviews
Decision·ICLR 2026 Poster
S1) The paper is well-written, the proposed pipeline is simple and easy to follow. S2) The framework enables dynamic test-time adaptation of world model mixtures through prototype refinement, allowing rapid adjustment to unseen environments without retraining. S3) The distilled model augmentation capability supports continuous expansion of the system's knowledge base through efficient few-shot learning from existing model mixtures.
W1) The paper emphasizes that multi-granular prototypes capture features from local objects to global scenes for fine-grained routing. However, when multiple distinct domains exhibit significant feature overlap at a certain granularity level (e.g., sharing similar local objects but having vastly different global scene semantics), how does the router effectively prevent expert confusion and erroneous activation? W2) The test-time prototype refinement relies on online environmental interaction an
- The paper addresses an important gap in generalization of LM-based agents for embodied tasks and offers a solid alternative to costly retraining for new domains - Authors conduct a thorough evaluation of the framework, on a comprehensive set of environments, tasks and baselines, showcasing significant performance improvement over state-of-the-art methods across both metrics (success rate and pending steps) and a very positive performance especially on unseen domains. - The authors also trans
I would've liked to see some discussion on the inference time for such an approach compared to the baselines. How much time does the test-time adaptation add to the decision making process and could it be regarded as feasible in a real-world domain from this point of view?
TMoW demonstrates strong performance gains over state-of-the-art baselines in both zero-shot and few-shot adaptation across multiple embodied benchmarks.
1. The motivation of this paper is inaccurate. The abstract states that "conventional MoE architectures modularize knowledge into expert components with pre-trained routing; they remain rigid once deployed, making them less effective for adapting to unseen domains in dynamic environments." However, there is already a large number of works addressing Dynamic MoE, Adapted MoE, and MoE for Continual Test-time Adaptation. 2. Home environments inherently have low dynamics, making it difficult to accu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning
