TL;DR
This paper introduces a generative model that predicts natural object deformations in response to local pixel-level manipulations, enabling interactive control over object dynamics in images without requiring physical scene data.
Contribution
It presents a novel approach that learns object dynamics from videos to enable local interactive deformation prediction on static images, applicable to unseen objects.
Findings
Effective in predicting realistic object deformations
Outperforms common video prediction frameworks
Works on diverse object categories
Abstract
What would be the effect of locally poking a static scene? We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Training requires only videos of moving objects but no information of the underlying manipulation of the physical scene. Our generative model learns to infer natural object dynamics as a response to user interaction and learns about the interrelations between different object body regions. Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time. In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos but enable local interactive control of the deformation. Our model is not restricted to particular object categories and can transfer dynamics onto novel unseen object instances. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
