A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised   Learning

Marco Fraccaro; Simon Kamronn; Ulrich Paquet; Ole Winther

arXiv:1710.05741·stat.ML·October 31, 2017·116 cites

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

Marco Fraccaro, Simon Kamronn, Ulrich Paquet, Ole Winther

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Kalman variational auto-encoder that learns disentangled latent representations for objects and their dynamics in videos, enabling improved temporal reasoning and data imputation without high-dimensional frame generation.

Contribution

It presents a novel unsupervised model that separates object recognition from dynamic state evolution in latent space, enhancing video understanding and prediction.

Findings

01

Outperforms existing methods in generative tasks

02

Achieves superior missing data imputation

03

Effective on simulated physical systems

Abstract

This paper takes a step towards temporal reasoning in a dynamically changing video, not in the pixel space that constitutes its frames, but in a latent space that describes the non-linear dynamics of the objects in its world. We introduce the Kalman variational auto-encoder, a framework for unsupervised learning of sequential data that disentangles two latent representations: an object's representation, coming from a recognition model, and a latent state describing its dynamics. As a result, the evolution of the world can be imagined and missing data imputed, both without the need to generate high dimensional frames at each time step. The model is trained end-to-end on videos of a variety of simulated physical systems, and outperforms competing methods in generative and missing data imputation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

simonkamronn/kvae
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image Processing Techniques · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks