Visual Graphs from Motion (VGfM): Scene understanding with object   geometry reasoning

Paul Gay; Stuart James; Alessio Del Bue

arXiv:1807.05933·cs.CV·November 8, 2018

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

Paul Gay, Stuart James, Alessio Del Bue

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel system that leverages multi-view geometric relations from video sequences to generate 3D scene graphs, enhancing scene understanding by combining geometry and visual features within an RNN framework.

Contribution

It presents a new model that merges geometric and visual features using an RNN to construct 3D scene graphs from video sequences, addressing limitations of single-image scene understanding.

Findings

01

Effective 3D scene graph generation from multi-view videos

02

Improved scene understanding through geometric reasoning

03

New dataset for 3D scene graph tasks

Abstract

Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition