SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation
Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang

TL;DR
SpatialActor introduces a disentangled framework that explicitly separates semantics and geometry for robotic manipulation, improving robustness and generalization in noisy and diverse real-world scenarios.
Contribution
It proposes a novel framework that decouples semantics and geometry, incorporating adaptive fusion and low-level spatial cues for enhanced manipulation performance.
Findings
Achieves 87.4% on RLBench, outperforming previous methods.
Improves robustness by 13.9% to 19.4% under noisy conditions.
Enhances few-shot generalization to new tasks.
Abstract
Robotic manipulation requires precise spatial understanding to interact with objects in the real world. Point-based methods suffer from sparse sampling, leading to the loss of fine-grained semantics. Image-based methods typically feed RGB and depth into 2D backbones pre-trained on 3D auxiliary tasks, but their entangled semantics and geometry are sensitive to inherent depth noise in real-world that disrupts semantic understanding. Moreover, these methods focus on high-level geometry while overlooking low-level spatial cues essential for precise interaction. We propose SpatialActor, a disentangled framework for robust robotic manipulation that explicitly decouples semantics and geometry. The Semantic-guided Geometric Module adaptively fuses two complementary geometry from noisy depth and semantic-guided expert priors. Also, a Spatial Transformer leverages low-level spatial cues for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
