Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks
Jianbo Jiao, Jo\~ao F. Henriques

TL;DR
This paper introduces a novel auto-encoder architecture that learns to be equivariant to arbitrary transformations, including non-geometric ones, purely from data, enhancing interpretability and robustness of deep networks.
Contribution
It proposes a flexible auto-encoder model that enforces multiple equivariance relations simultaneously without explicit transformation models.
Findings
Successfully re-renders transformed images on synthetic and real datasets.
Achieves object pose estimation with improved robustness.
Reduces to a CNN for translation-equivariance.
Abstract
In this work we investigate how to achieve equivariance to input transformations in deep networks, purely from data, without being given a model of those transformations. Convolutional Neural Networks (CNNs), for example, are equivariant to image translation, a transformation that can be easily modelled (by shifting the pixels vertically or horizontally). Other transformations, such as out-of-plane rotations, do not admit a simple analytic model. We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously, such as translation, rotation, colour changes, and many others. This means that it can take an input image, and produce versions transformed by a given amount that were not observed before (e.g. a different point of view of the same object, or a colour variation). Despite extending to many (even non-geometric)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
