Learning Physical Graph Representations from Visual Scenes

Daniel M. Bear; Chaofei Fan; Damian Mrowca; Yunzhu Li; Seth Alter,; Aran Nayebi; Jeremy Schwartz; Li Fei-Fei; Jiajun Wu; Joshua B. Tenenbaum,; Daniel L.K. Yamins

arXiv:2006.12373·cs.CV·June 25, 2020·44 cites

Learning Physical Graph Representations from Visual Scenes

Daniel M. Bear, Chaofei Fan, Damian Mrowca, Yunzhu Li, Seth Alter,, Aran Nayebi, Jeremy Schwartz, Li Fei-Fei, Jiajun Wu, Joshua B. Tenenbaum,, Daniel L.K. Yamins

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Physical Scene Graphs (PSGs) and PSGNet, a novel neural network architecture that explicitly encodes objects, parts, and their physical properties in scenes, improving scene understanding beyond traditional CNNs.

Contribution

The paper proposes PSGs as hierarchical graph representations of scenes and PSGNet to learn these structures, integrating feedback, graph pooling, and perceptual grouping for enhanced scene segmentation.

Findings

01

PSGNet outperforms existing self-supervised methods on scene segmentation.

02

PSGNet generalizes well to unseen objects and arrangements.

03

Learned latent attributes capture intuitive scene properties.

Abstract

Convolutional Neural Networks (CNNs) have proved exceptional at learning representations for visual object categorization. However, CNNs do not explicitly encode objects, parts, and their physical properties, which has limited CNNs' success on tasks that require structured understanding of visual scenes. To overcome these limitations, we introduce the idea of Physical Scene Graphs (PSGs), which represent scenes as hierarchical graphs, with nodes in the hierarchy corresponding intuitively to object parts at different scales, and edges to physical connections between parts. Bound to each node is a vector of latent attributes that intuitively represent object properties such as surface shape and texture. We also describe PSGNet, a network architecture that learns to extract PSGs by reconstructing scenes through a PSG-structured bottleneck. PSGNet augments standard CNNs by including:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neuroailab/tnn
tf

Videos

Learning Physical Graph Representations from Visual Scenes· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization