MagicPony: Learning Articulated 3D Animals in the Wild
Shangzhe Wu, Ruining Li, Tomas Jakab, Christian Rupprecht, Andrea, Vedaldi

TL;DR
MagicPony is a novel method that predicts detailed 3D shape, articulation, and appearance of animals from a single image, learning from in-the-wild data with minimal assumptions.
Contribution
It introduces an implicit-explicit shape and appearance representation combined with knowledge distillation from a vision transformer, enabling accurate 3D animal reconstruction from real images.
Findings
Outperforms prior methods on 3D animal reconstruction tasks.
Demonstrates strong generalization to art images.
Learns from in-the-wild images with minimal assumptions.
Abstract
We consider the problem of predicting the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse given a single test image as input. We present a new method, dubbed MagicPony, that learns this predictor purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no additional training cost. MagicPony outperforms prior work on this challenging task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Advanced Vision and Imaging · 3D Shape Modeling and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Test · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer
