Articulate3D: Zero-Shot Text-Driven 3D Object Posing

Oishi Deb; Anjun Hu; Ashkan Khakzar; Philip Torr; Christian Rupprecht

arXiv:2508.19244·cs.CV·August 27, 2025

Articulate3D: Zero-Shot Text-Driven 3D Object Posing

Oishi Deb, Anjun Hu, Ashkan Khakzar, Philip Torr, Christian Rupprecht

PDF

TL;DR

Articulate3D introduces a training-free, text-driven method for posing 3D objects by combining image generation, keypoint-based alignment, and a novel self-attention mechanism, enabling flexible and identity-preserving pose manipulation.

Contribution

The paper presents a novel zero-shot approach that leverages a modified image generator and keypoint alignment for 3D object posing without training, outperforming existing methods.

Findings

01

Successfully manipulates 3D object poses with diverse text prompts

02

Achieves over 85% preference in user studies

03

Demonstrates robustness across various object types

Abstract

We propose a training-free method, Articulate3D, to pose a 3D asset through language control. Despite advances in vision and language models, this task remains surprisingly challenging. To achieve this goal, we decompose the problem into two steps. We modify a powerful image-generator to create target images conditioned on the input image and a text instruction. We then align the mesh to the target images through a multi-view pose optimisation step. In detail, we introduce a self-attention rewiring mechanism (RSActrl) that decouples the source structure from pose within an image generative model, allowing it to maintain a consistent structure across varying poses. We observed that differentiable rendering is an unreliable signal for articulation optimisation; instead, we use keypoints to establish correspondences between input and target images. The effectiveness of Articulate3D is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.