Neural Congealing: Aligning Images to a Joint Semantic Atlas
Dolev Ofri-Amar, Michal Geyer, Yoni Kasten, Tali Dekel

TL;DR
Neural Congealing is a zero-shot, self-supervised framework that aligns images to a shared semantic atlas using pre-trained features, handling diverse variations without additional annotations.
Contribution
It introduces a novel self-supervised method leveraging DINO-ViT features to jointly align images to a semantic atlas without requiring training data or masks.
Findings
Effective alignment across diverse image sets
Outperforms state-of-the-art methods without extensive training
Handles severe variations in appearance, pose, and background
Abstract
We present Neural Congealing -- a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas -- a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images. We derive a new robust self-supervised framework that optimizes the atlas representation and mappings per image set, requiring only a few real-world images as input without any additional input information (e.g., segmentation masks). Notably, we design our losses and training paradigm to account only for the shared content under severe variations in appearance, pose, background clutter or other distracting objects. We demonstrate results on a plethora of challenging image sets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
