Unsupervised Learning of Object Landmarks through Conditional Image Generation
Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

TL;DR
This paper introduces an unsupervised method for learning object landmarks by generating images conditioned on appearance and geometry, effectively capturing key features without manual labels across diverse datasets.
Contribution
The authors present a novel unsupervised approach that learns object landmarks through conditional image generation, outperforming existing methods and applicable to various object types.
Findings
Successfully learned landmarks from synthetic deformations and videos
Outperformed state-of-the-art unsupervised landmark detectors
Applicable to diverse datasets including faces, objects, and digits
Abstract
We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision. We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the two examples differ by a viewpoint change and/or an object deformation. In order to factorize appearance and geometry, we introduce a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features. Compared to standard image generation problems, which often use generative adversarial networks, our generation task is conditioned on both appearance and geometry and thus is significantly less ambiguous, to the point that adopting a simple perceptual loss formulation is sufficient. We demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
