Transformation Properties of Learned Visual Representations
Taco S. Cohen, Max Welling

TL;DR
This paper explores how learned visual representations transform under scene motions, using group theory to analyze their properties and demonstrating the importance of latent representations for modeling 3D rotations.
Contribution
It introduces a theoretical framework linking the linearity of visual representations to group irreducibility and demonstrates the necessity of latent spaces for modeling complex motions.
Findings
Representations equivalent to combinations of irreducible components.
Irreducible representations tend to be decorrelated.
Latent representations enable modeling of 3D rotations under partial observability.
Abstract
When a three-dimensional object moves relative to an observer, a change occurs on the observer's image plane and in the visual representation computed by a learned model. Starting with the idea that a good visual representation is one that transforms linearly under scene motions, we show, using the theory of group representations, that any such representation is equivalent to a combination of the elementary irreducible representations. We derive a striking relationship between irreducibility and the statistical dependency structure of the representation, by showing that under restricted conditions, irreducible representations are decorrelated. Under partial observability, as induced by the perspective projection of a scene onto the image plane, the motion group does not have a linear action on the space of images, so that it becomes necessary to perform inference over a latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization
