PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured   State Representations

Rico Jonschkowski; Roland Hafner; Jonathan Scholz; and Martin; Riedmiller

arXiv:1705.09805·cs.RO·July 25, 2017·36 cites

PVEs: Position-Velocity Encoders for Unsupervised Learning of Structured State Representations

Rico Jonschkowski, Roland Hafner, Jonathan Scholz, and Martin, Riedmiller

PDF

Open Access

TL;DR

PVEs are unsupervised encoders that learn to extract position and velocity information of objects from images, using physical priors instead of reconstruction, to improve structured state representations for control tasks.

Contribution

This paper introduces PVEs, a novel unsupervised method that encodes position and velocity from images based on physical priors, differing from traditional autoencoders.

Findings

01

Successfully applied PVEs to simulated control tasks from pixel inputs.

02

Achieved promising preliminary results in learning structured state representations.

03

Demonstrated the effectiveness of physical priors in unsupervised learning of dynamics.

Abstract

We propose position-velocity encoders (PVEs) which learn---without supervision---to encode images to positions and velocities of task-relevant objects. PVEs encode a single image into a low-dimensional position state and compute the velocity state from finite differences in position. In contrast to autoencoders, position-velocity encoders are not trained by image reconstruction, but by making the position-velocity representation consistent with priors about interacting with the physical world. We applied PVEs to several simulated control tasks from pixels and achieved promising preliminary results.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning