AETv2: AutoEncoding Transformations for Self-Supervised Representation   Learning by Minimizing Geodesic Distances in Lie Groups

Feng Lin; Haohang Xu; Houqiang Li; Hongkai Xiong; Guo-Jun Qi

arXiv:1911.07004·cs.CV·November 19, 2019·1 cites

AETv2: AutoEncoding Transformations for Self-Supervised Representation Learning by Minimizing Geodesic Distances in Lie Groups

Feng Lin, Haohang Xu, Houqiang Li, Hongkai Xiong, Guo-Jun Qi

PDF

Open Access

TL;DR

AETv2 introduces a novel self-supervised learning method that encodes transformations on Lie groups using geodesic distances, improving representation learning by better capturing the transformation manifold.

Contribution

It proposes a new approach to measure transformation deviations on Lie groups using geodesic distances, enhancing self-supervised learning effectiveness.

Findings

01

AETv2 outperforms previous models in multiple tasks.

02

Using geodesic distances improves transformation estimation accuracy.

03

The method effectively captures the manifold structure of transformations.

Abstract

Self-supervised learning by predicting transformations has demonstrated outstanding performances in both unsupervised and (semi-)supervised tasks. Among the state-of-the-art methods is the AutoEncoding Transformations (AET) by decoding transformations from the learned representations of original and transformed images. Both deterministic and probabilistic AETs rely on the Euclidean distance to measure the deviation of estimated transformations from their groundtruth counterparts. However, this assumption is questionable as a group of transformations often reside on a curved manifold rather staying in a flat Euclidean space. For this reason, we should use the geodesic to characterize how an image transform along the manifold of a transformation group, and adopt its length to measure the deviation between transformations. Particularly, we present to autoencode a Lie group of homography…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition