G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

Tianxing Chen; Yao Mu; Zhixuan Liang; Zanxin Chen; Shijia Peng; Qiangyu Chen; Mingkun Xu; Ruizhen Hu; Hongyuan Zhang; Xuelong Li; Ping Luo

arXiv:2411.18369·cs.RO·June 24, 2025

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

Tianxing Chen, Yao Mu, Zhixuan Liang, Zanxin Chen, Shijia Peng, Qiangyu Chen, Mingkun Xu, Ruizhen Hu, Hongyuan Zhang, Xuelong Li, Ping Luo

PDF

Open Access 1 Repo 1 Models

TL;DR

G3Flow introduces a real-time semantic flow framework that enhances 3D robotic manipulation by integrating foundation models, leading to improved success rates and generalization in complex tasks.

Contribution

The paper presents G3Flow, a novel framework combining 3D generative models, foundation models, and pose tracking to achieve semantic understanding without manual annotations.

Findings

01

Achieves up to 68.3% success in terminal-constrained tasks.

02

Attains 50.1% success in cross-object generalization.

03

Outperforms existing methods across five simulation tasks.

Abstract

Recent advances in imitation learning for 3D robotic manipulation have shown promising results with diffusion-based policies. However, achieving human-level dexterity requires seamless integration of geometric precision and semantic understanding. We present G3Flow, a novel framework that constructs real-time semantic flow, a dynamic, object-centric 3D semantic representation by leveraging foundation models. Our approach uniquely combines 3D generative models for digital twin creation, vision foundation models for semantic feature extraction, and robust pose tracking for continuous semantic flow updates. This integration enables complete semantic understanding even under occlusions while eliminating manual annotation requirements. By incorporating semantic flow into diffusion policies, we demonstrate significant improvements in both terminal-constrained manipulation and cross-object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TianxingChen/RoboTwin
pytorch

Models

🤗
iMihayo/custom_robotwin
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Image Processing and 3D Reconstruction · Human Pose and Action Recognition

MethodsDiffusion