Articulate3D: Zero-Shot Text-Driven 3D Object Posing
Oishi Deb, Anjun Hu, Ashkan Khakzar, Philip Torr, Christian Rupprecht

TL;DR
Articulate3D introduces a training-free, text-driven method for posing 3D objects by combining image generation, keypoint-based alignment, and a novel self-attention mechanism, enabling flexible and identity-preserving pose manipulation.
Contribution
The paper presents a novel zero-shot approach that leverages a modified image generator and keypoint alignment for 3D object posing without training, outperforming existing methods.
Findings
Successfully manipulates 3D object poses with diverse text prompts
Achieves over 85% preference in user studies
Demonstrates robustness across various object types
Abstract
We propose a training-free method, Articulate3D, to pose a 3D asset through language control. Despite advances in vision and language models, this task remains surprisingly challenging. To achieve this goal, we decompose the problem into two steps. We modify a powerful image-generator to create target images conditioned on the input image and a text instruction. We then align the mesh to the target images through a multi-view pose optimisation step. In detail, we introduce a self-attention rewiring mechanism (RSActrl) that decouples the source structure from pose within an image generative model, allowing it to maintain a consistent structure across varying poses. We observed that differentiable rendering is an unreliable signal for articulation optimisation; instead, we use keypoints to establish correspondences between input and target images. The effectiveness of Articulate3D is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
