Multiview Scene Graph

Juexiao Zhang; Gao Zhu; Sihang Li; Xinhao Liu; Haorui Song; Xinran; Tang; Chen Feng

arXiv:2410.11187·cs.CV·November 21, 2024

Multiview Scene Graph

Juexiao Zhang, Gao Zhu, Sihang Li, Xinhao Liu, Haorui Song, Xinran, Tang, Chen Feng

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces Multiview Scene Graphs (MSG), a topological scene representation built from unposed images, along with a new dataset, evaluation metrics, and a baseline method demonstrating improved performance in scene understanding tasks.

Contribution

The work presents the first MSG dataset, a novel evaluation metric, and a Transformer-based baseline method for constructing scene graphs from unposed images.

Findings

01

Proposed MSG dataset enables scene graph evaluation.

02

New metric based on intersection-over-union for MSG edges.

03

Baseline method outperforms existing approaches.

Abstract

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction, 3D bounding boxes in object detection, or voxel grids in occupancy prediction, or topological, such as pose graphs with loop closures in SLAM or visibility graphs in SfM. In this work, we propose to build Multiview Scene Graphs (MSG) from unposed images, representing a scene topologically with interconnected place and object nodes. The task of building MSG is challenging for existing representation learning methods since it needs to jointly address both visual place recognition, object detection, and object association from images with limited fields of view and potentially large viewpoint changes. To evaluate any method tackling this task, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4ce/MSG
pytorchOfficial

Datasets

ai4ce/MSG
dataset· 113 dl
113 dl

Videos

Multiview Scene Graph· slideslive

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Artificial Intelligence in Games

MethodsDense Connections · Residual Connection · Dropout · Layer Normalization · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Attention Is All You Need · Linear Layer