Learning Physical Intuition of Block Towers by Example
Adam Lerer, Sam Gross, Rob Fergus

TL;DR
This paper demonstrates that deep convolutional models can learn and generalize physical intuition about block tower stability from simulated data, predicting outcomes and trajectories with human-level accuracy.
Contribution
It introduces a method to train neural networks on simulated block tower data to learn physical intuition and generalize to real-world images.
Findings
Models accurately predict tower stability outcomes.
Models generalize to new physical scenarios.
Performance comparable to human subjects on real images.
Abstract
Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the block trajectories. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Music Technology and Sound Studies · Advanced Vision and Imaging
