Surfer: Progressive Reasoning with World Models for Robotic Manipulation
Pengzhen Ren, Kaidong Zhang, Hetao Zheng, Zixuan Li, Yuhang Wen,, Fengda Zhu, Mas Ma, Xiaodan Liang

TL;DR
Surfer is a novel robot manipulation framework that models world knowledge explicitly, improving generalization on natural language instructions and physical tasks, supported by a new simulator and benchmark.
Contribution
It introduces Surfer, a world model-based framework for robotic manipulation, along with a physics-based simulator and a progressive reasoning benchmark, enhancing generalization and evaluation.
Findings
Surfer achieved a 54.74% success rate on manipulation tasks.
Surfer outperformed baseline methods with a 7.1% higher success rate.
The framework effectively generalizes to new instructions and scenes.
Abstract
Considering how to make the model accurately understand and follow natural language instructions and perform actions consistent with world knowledge is a key challenge in robot manipulation. This mainly includes human fuzzy instruction reasoning and the following of physical knowledge. Therefore, the embodied intelligence agent must have the ability to model world knowledge from training data. However, most existing vision and language robot manipulation methods mainly operate in less realistic simulator and language settings and lack explicit modeling of world knowledge. To bridge this gap, we introduce a novel and simple robot manipulation framework, called Surfer. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. Then, the generalization ability of the model on new instructions and new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotic Path Planning Algorithms · Reinforcement Learning in Robotics
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Residual Connection
