MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation
Yu Sheng, Runfeng Lin, Lidian Wang, Quecheng Qiu, YanYong Zhang, Yu, Zhang, Bei Hua, and Jianmin Ji

TL;DR
MSGField is a real-time, unified scene representation combining geometry, semantics, and motion, enabling effective language-guided robotic manipulation in dynamic and complex environments.
Contribution
The paper introduces MSGField, a novel scene representation using 2D Gaussians with semantic and motion attributes, optimized via differentiable rendering for real-time manipulation tasks.
Findings
Achieves 79.2% success in static and 63.3% in dynamic environments.
Attains 90% success in specified object grasping, comparable to point cloud methods.
Effectively handles flexible and small objects in complex scenes.
Abstract
Combining accurate geometry with rich semantics has been proven to be highly effective for language-guided robotic manipulation. Existing methods for dynamic scenes either fail to update in real-time or rely on additional depth sensors for simple scene editing, limiting their applicability in real-world. In this paper, we introduce MSGField, a representation that uses a collection of 2D Gaussians for high-quality reconstruction, further enhanced with attributes to encode semantic and motion information. Specially, we represent the motion field compactly by decomposing each primitive's motion into a combination of a limited set of motion bases. Leveraging the differentiable real-time rendering of Gaussian splatting, we can quickly optimize object motion, even for complex non-rigid motions, with image supervision from only two camera views. Additionally, we designed a pipeline that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Robot Manipulation and Learning · Human Motion and Animation
