Towards Visual Foundational Models of Physical Scenes
Chethan Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li,, Jiawei Mo, Matthew Trager, Ashwin Swaminathan, CJ Taylor, Dheera Venkatraman,, Xiaohan Fei, Stefano Soatto

TL;DR
This paper explores learning general-purpose visual representations of physical scenes using image prediction, proposing a novel combination of NeRFs and Diffusion Models to better capture physical scene properties.
Contribution
It introduces the concept of NeRF Diffusion, combining NeRFs with Diffusion Models to improve physical scene representation in visual data.
Findings
NeRFs alone cannot represent physical scenes effectively.
Diffusion Models theoretically provide extrapolation capabilities.
NeRF Diffusion shows promise as an unsupervised scene representation.
Abstract
We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represent the physical scene, as they lack extrapolation mechanisms. Those, however, could be provided by Diffusion Models, at least in theory. To test this hypothesis empirically, NeRFs can be combined with Diffusion Models, a process we refer to as NeRF Diffusion, used as unsupervised representations of the physical scene. Our analysis is limited to visual data, without external grounding mechanisms that can be provided by independent sensory modalities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
MethodsTest · Diffusion
