Opinion: Learning Intuitive Physics May Require More than Visual Data
Ellen Su, Solim Legris, Todd M. Gureckis, Mengye Ren

TL;DR
This study explores whether training on developmentally realistic egocentric videos enhances models' understanding of intuitive physics, finding that data realism alone does not significantly improve performance on physics benchmarks.
Contribution
The paper demonstrates that training on a small, realistic egocentric dataset does not significantly improve deep models' intuitive physics understanding, highlighting the need for other learning strategies.
Findings
Training on SAYCam data does not improve IntPhys2 benchmark performance.
Data volume and distribution alone are insufficient for learning intuitive physics.
Current architectures require additional methods beyond realistic data exposure.
Abstract
Humans expertly navigate the world by building rich internal models founded on an intuitive understanding of physics. Meanwhile, despite training on vast quantities of internet video data, state-of-the-art deep learning models still fall short of human-level performance on intuitive physics benchmarks. This work investigates whether data distribution, rather than volume, is the key to learning these principles. We pretrain a Video Joint Embedding Predictive Architecture (V-JEPA) model on SAYCam, a developmentally realistic, egocentric video dataset partially capturing three children's everyday visual experiences. We find that training on this dataset, which represents 0.01% of the data volume used to train SOTA models, does not lead to significant performance improvements on the IntPhys2 benchmark. Our results suggest that merely training on a developmentally realistic dataset is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics
