SHIC: Shape-Image Correspondences with no Keypoint Supervision

Aleksandar Shtedritski; Christian Rupprecht; Andrea Vedaldi

arXiv:2407.18907·cs.CV·July 29, 2024

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi

PDF

Open Access

TL;DR

SHIC introduces a novel, unsupervised approach to learning canonical surface mappings by leveraging foundation models and image generation, outperforming supervised methods across multiple categories.

Contribution

The paper presents SHIC, a method that learns shape-image correspondences without manual supervision by utilizing foundation models and image generation techniques.

Findings

01

Outperforms supervised methods on most categories

02

Leverages foundation models like DINO and Stable Diffusion

03

Uses image generation to enhance template realism

Abstract

Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template. Popularised by DensePose for the analysis of humans, authors have since attempted to apply the concept to more categories, but with limited success due to the high cost of manual supervision. In this work, we introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories. Our idea is to leverage foundation computer vision models such as DINO and Stable Diffusion that are open-ended and thus possess excellent priors over natural categories. SHIC reduces the problem of estimating image-to-template correspondences to predicting image-to-image correspondences using features from the foundation models. The reduction works by matching images of the object to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Residual Connection · Layer Normalization · Softmax · Diffusion · Dense Connections · Vision Transformer · self-DIstillation with NO labels