Interpretable Representation Learning from Videos using Nonlinear Priors

Marian Longa; Jo\~ao F. Henriques

arXiv:2410.18539·cs.CV·October 25, 2024

Interpretable Representation Learning from Videos using Nonlinear Priors

Marian Longa, Jo\~ao F. Henriques

PDF

Open Access

TL;DR

This paper introduces a deep learning framework that incorporates nonlinear physical priors into video representation learning, enabling the generation of interpretable and physically plausible hypothetical videos.

Contribution

It extends the VAE prior to a nonlinear Additive Noise Model and develops a novel linearization method for stable training, allowing physical interpretability in video generation.

Findings

01

Successfully learned physical variables from real-world physics videos.

02

Generated plausible hypothetical videos by intervening on physical variables.

03

Validated on diverse physics scenarios like pendulums and pulsars.

Abstract

Learning interpretable representations of visual data is an important challenge, to make machines' decisions understandable to humans and to improve generalisation outside of the training distribution. To this end, we propose a deep learning framework where one can specify nonlinear priors for videos (e.g. of Newtonian physics) that allow the model to learn interpretable latent variables and use these to generate videos of hypothetical scenarios not observed at training time. We do this by extending the Variational Auto-Encoder (VAE) prior from a simple isotropic Gaussian to an arbitrary nonlinear temporal Additive Noise Model (ANM), which can describe a large number of processes (e.g. Newtonian physics). We propose a novel linearization method that constructs a Gaussian Mixture Model (GMM) approximating the prior, and derive a numerically stable Monte Carlo estimate of the KL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications