TL;DR
This paper introduces the Zero-shot Visual World Model (ZWM), a computational framework inspired by child development that learns physical scene understanding efficiently from minimal data and generalizes to multiple tasks.
Contribution
The paper proposes ZWM, a novel model based on principles of decoupled appearance and dynamics, causal inference, and compositional reasoning, mimicking early child development.
Findings
ZWM learns from a single child's experience to perform multiple physical understanding tasks.
ZWM recapitulates behavioral signatures of child development.
ZWM builds brain-like internal representations.
Abstract
Young children demonstrate early abilities to understand their physical world, estimating depth, motion, object coherence, interactions, and many other aspects of physical scene understanding. Children are both data-efficient and flexible cognitive systems, creating competence despite extremely limited training data, while generalizing to myriad untrained tasks -- a major challenge even for today's best AI systems. Here we introduce a novel computational hypothesis for these abilities, the Zero-shot Visual World Model (ZWM). ZWM is based on three principles: a sparse temporally-factored predictor that decouples appearance from dynamics; zero-shot estimation through approximate causal inference; and composition of inferences to build more complex abilities. We show that ZWM can be learned from the first-person experience of a single child, rapidly generating competence across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
