SHIC: Shape-Image Correspondences with no Keypoint Supervision
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

TL;DR
SHIC introduces a novel, unsupervised approach to learning canonical surface mappings by leveraging foundation models and image generation, outperforming supervised methods across multiple categories.
Contribution
The paper presents SHIC, a method that learns shape-image correspondences without manual supervision by utilizing foundation models and image generation techniques.
Findings
Outperforms supervised methods on most categories
Leverages foundation models like DINO and Stable Diffusion
Uses image generation to enhance template realism
Abstract
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template. Popularised by DensePose for the analysis of humans, authors have since attempted to apply the concept to more categories, but with limited success due to the high cost of manual supervision. In this work, we introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories. Our idea is to leverage foundation computer vision models such as DINO and Stable Diffusion that are open-ended and thus possess excellent priors over natural categories. SHIC reduces the problem of estimating image-to-template correspondences to predicting image-to-image correspondences using features from the foundation models. The reduction works by matching images of the object to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Layer Normalization · Softmax · Diffusion · Dense Connections · Vision Transformer · self-DIstillation with NO labels
