Neural Congealing: Aligning Images to a Joint Semantic Atlas

Dolev Ofri-Amar; Michal Geyer; Yoni Kasten; Tali Dekel

arXiv:2302.03956·cs.CV·March 7, 2023

Neural Congealing: Aligning Images to a Joint Semantic Atlas

Dolev Ofri-Amar, Michal Geyer, Yoni Kasten, Tali Dekel

PDF

Open Access

TL;DR

Neural Congealing is a zero-shot, self-supervised framework that aligns images to a shared semantic atlas using pre-trained features, handling diverse variations without additional annotations.

Contribution

It introduces a novel self-supervised method leveraging DINO-ViT features to jointly align images to a semantic atlas without requiring training data or masks.

Findings

01

Effective alignment across diverse image sets

02

Outperforms state-of-the-art methods without extensive training

03

Handles severe variations in appearance, pose, and background

Abstract

We present Neural Congealing -- a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas -- a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images. We derive a new robust self-supervised framework that optimizes the atlas representation and mappings per image set, requiring only a few real-world images as input without any additional input information (e.g., segmentation masks). Notably, we design our losses and training paradigm to account only for the shared content under severe variations in appearance, pose, background clutter or other distracting objects. We demonstrate results on a plethora of challenging image sets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications