Structural Action Transformer for 3D Dexterous Manipulation

Xiaohan Lei; Min Wang; Bohong Weng; Wengang Zhou; Houqiang Li

arXiv:2603.03960·cs.RO·March 5, 2026

Structural Action Transformer for 3D Dexterous Manipulation

Xiaohan Lei, Min Wang, Bohong Weng, Wengang Zhou, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces the Structural Action Transformer (SAT), a novel 3D manipulation policy that uses a structural-centric approach to enable effective cross-embodiment skill transfer for high-DoF robotic hands.

Contribution

The paper proposes a structural-centric action representation and an Embodied Joint Codebook to improve cross-embodiment transfer and sample efficiency in 3D dexterous manipulation.

Findings

01

Outperforms baseline methods in simulation and real-world tasks

02

Demonstrates superior sample efficiency in learning from heterogeneous datasets

03

Enables effective transfer of skills across different robotic hand embodiments

Abstract

Achieving human-level dexterity in robots via imitation learning from heterogeneous datasets is hindered by the challenge of cross-embodiment skill transfer, particularly for high-DoF robotic hands. Existing methods, often relying on 2D observations and temporal-centric action representation, struggle to capture 3D spatial relations and fail to handle embodiment heterogeneity. This paper proposes the Structural Action Transformer (SAT), a new 3D dexterous manipulation policy that challenges this paradigm by introducing a structural-centric perspective. We reframe each action chunk not as a temporal sequence, but as a variable-length, unordered sequence of joint-wise trajectories. This structural formulation allows a Transformer to natively handle heterogeneous embodiments, treating the joint count as a variable sequence length. To encode structural priors and resolve ambiguity, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Motion and Animation · Human Pose and Action Recognition