Interpretable Transformations with Encoder-Decoder Networks

Daniel E. Worrall; Stephan J. Garbin; Daniyar Turmukhambetov; Gabriel; J. Brostow

arXiv:1710.07307·cs.CV·October 23, 2017

Interpretable Transformations with Encoder-Decoder Networks

Daniel E. Worrall, Stephan J. Garbin, Daniyar Turmukhambetov, Gabriel, J. Brostow

PDF

TL;DR

This paper introduces a method to create a deep feature space with explicitly disentangled representations of known transformations, enabling controlled manipulation and better understanding of complex image transformations.

Contribution

It proposes a transforming encoder-decoder network with a custom feature transform layer to explicitly disentangle factors like pose, appearance, and illumination.

Findings

01

Disentangled representations improve interpretability of transformations.

02

The method enables explicit control over image parameters.

03

Advantages are demonstrated across various datasets and tasks.

Abstract

Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.