Visual Interaction Networks
Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu,, Peter Battaglia, Daniel Zoran

TL;DR
The paper introduces the Visual Interaction Network, a model that learns to predict the future states of physical systems directly from raw visual data, combining perception and dynamics prediction.
Contribution
It presents a novel end-to-end trainable model that jointly learns visual parsing and physical dynamics prediction from raw videos.
Findings
Accurately predicts physical trajectories over hundreds of time steps.
Can infer invisible object states and unknown masses.
Works across diverse physical systems from minimal visual input.
Abstract
From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains and require direct measurements of the underlying states. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions and dynamics, producing a predicted physical trajectory of arbitrary length. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
