Interpretable Transformations with Encoder-Decoder Networks
Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel, J. Brostow

TL;DR
This paper introduces a method to create a deep feature space with explicitly disentangled representations of known transformations, enabling controlled manipulation and better understanding of complex image transformations.
Contribution
It proposes a transforming encoder-decoder network with a custom feature transform layer to explicitly disentangle factors like pose, appearance, and illumination.
Findings
Disentangled representations improve interpretability of transformations.
The method enables explicit control over image parameters.
Advantages are demonstrated across various datasets and tasks.
Abstract
Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
