# Self-supervised Learning of Interpretable Keypoints from Unlabelled   Videos

**Authors:** Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi

arXiv: 1907.02055 · 2020-12-24

## TL;DR

This paper introduces KeypointGAN, a self-supervised approach that learns interpretable object keypoints from unlabelled videos by analyzing frame differences and leveraging a dual geometric representation, achieving state-of-the-art results without labeled data.

## Contribution

The method uniquely combines a dual geometric representation and empirical pose priors from unpaired data to learn pose recognition without annotated images.

## Key findings

- Achieves state-of-the-art performance on pose recognition benchmarks.
- Learns interpretable keypoints using only unlabelled videos.
- Effectively disentangles pose from appearance.

## Abstract

We propose KeypointGAN, a new method for recognizing the pose of objects from a single image that for learning uses only unlabelled videos and a weak empirical prior on the object poses. Video frames differ primarily in the pose of the objects they contain, so our method distils the pose information by analyzing the differences between frames. The distillation uses a new dual representation of the geometry of objects as a set of 2D keypoints, and as a pictorial representation, i.e. a skeleton image. This has three benefits: (1) it provides a tight `geometric bottleneck' which disentangles pose from appearance, (2) it can leverage powerful image-to-image translation networks to map between photometry and geometry, and (3) it allows to incorporate empirical pose priors in the learning process. The pose priors are obtained from unpaired data, such as from a different dataset or modality such as mocap, such that no annotated image is ever used in learning the pose recognition network. In standard benchmarks for pose recognition for humans and faces, our method achieves state-of-the-art performance among methods that do not require any labelled images for training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.02055/full.md

## Figures

39 figures with captions in the complete paper: https://tomesphere.com/paper/1907.02055/full.md

## References

82 references — full list in the complete paper: https://tomesphere.com/paper/1907.02055/full.md

---
Source: https://tomesphere.com/paper/1907.02055