PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations
Rico Jonschkowski, Roland Hafner, Jonathan Scholz, and Martin, Riedmiller

TL;DR
PVEs are unsupervised encoders that learn to extract position and velocity information of objects from images, using physical priors instead of reconstruction, to improve structured state representations for control tasks.
Contribution
This paper introduces PVEs, a novel unsupervised method that encodes position and velocity from images based on physical priors, differing from traditional autoencoders.
Findings
Successfully applied PVEs to simulated control tasks from pixel inputs.
Achieved promising preliminary results in learning structured state representations.
Demonstrated the effectiveness of physical priors in unsupervised learning of dynamics.
Abstract
We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single image into a low-dimensional position state and compute the velocity state from finite differences in position. In contrast to autoencoders, position-velocity encoders are not trained by image reconstruction, but by making the position-velocity representation consistent with priors about interacting with the physical world. We applied PVEs to several simulated control tasks from pixels and achieved promising preliminary results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
