Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents
Shaofei Cai, Zhancun Mu, Haiwen Xia, Bowei Zhang, Anji Liu, Yitao Liang

TL;DR
This paper demonstrates that reinforcement learning fine-tuning in Minecraft enables visuomotor agents to achieve zero-shot generalization in spatial reasoning across diverse environments, addressing overfitting and manual task design challenges.
Contribution
It introduces a unified multi-task goal space, automated task synthesis, and an efficient distributed RL framework for large-scale training of generalizable visuomotor agents.
Findings
RL improves interaction success rates by 4x
Enables zero-shot generalization in unseen environments
Validates large-scale multi-task RL in 3D environments
Abstract
While Reinforcement Learning (RL) has achieved remarkable success in language modeling, its triumph hasn't yet fully translated to visuomotor agents. A primary challenge in RL models is their tendency to overfit specific tasks or environments, thereby hindering the acquisition of generalizable behaviors across diverse settings. This paper provides a preliminary answer to this challenge by demonstrating that RL-finetuned visuomotor agents in Minecraft can achieve zero-shot generalization to unseen worlds. Specifically, we explore RL's potential to enhance generalizable spatial reasoning and interaction capabilities in 3D worlds. To address challenges in multi-task RL representation, we analyze and establish cross-view goal specification as a unified multi-task goal space for visuomotor policies. Furthermore, to overcome the significant bottleneck of manual task design, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Tactile and Sensory Interactions · Robotics and Sensor-Based Localization
