SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi; Bin Xie; Yingfei Liu; Yang Yue; Tiancai Wang; Haoqiang Fan; Xiangyu Zhang; Gao Huang

arXiv:2511.09555·cs.RO·January 14, 2026

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation

Hao Shi, Bin Xie, Yingfei Liu, Yang Yue, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Gao Huang

PDF

Open Access 1 Video

TL;DR

SpatialActor introduces a disentangled framework that explicitly separates semantics and geometry for robotic manipulation, improving robustness and generalization in noisy and diverse real-world scenarios.

Contribution

It proposes a novel framework that decouples semantics and geometry, incorporating adaptive fusion and low-level spatial cues for enhanced manipulation performance.

Findings

01

Achieves 87.4% on RLBench, outperforming previous methods.

02

Improves robustness by 13.9% to 19.4% under noisy conditions.

03

Enhances few-shot generalization to new tasks.

Abstract

Robotic manipulation requires precise spatial understanding to interact with objects in the real world. Point-based methods suffer from sparse sampling, leading to the loss of fine-grained semantics. Image-based methods typically feed RGB and depth into 2D backbones pre-trained on 3D auxiliary tasks, but their entangled semantics and geometry are sensitive to inherent depth noise in real-world that disrupts semantic understanding. Moreover, these methods focus on high-level geometry while overlooking low-level spatial cues essential for precise interaction. We propose SpatialActor, a disentangled framework for robust robotic manipulation that explicitly decouples semantics and geometry. The Semantic-guided Geometric Module adaptively fuses two complementary geometry from noisy depth and semantic-guided expert priors. Also, a Spatial Transformer leverages low-level spatial cues for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SpatialActor: Exploring Disentangled Spatial Representations for Robust Robotic Manipulation· underline

Taxonomy

TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis