MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from   Monocular RGB Videos

Kejie Li; Hamid Rezatofighi; Ian Reid

arXiv:2012.05360·cs.CV·February 16, 2021·1 cites

MOLTR: Multiple Object Localisation, Tracking, and Reconstruction from Monocular RGB Videos

Kejie Li, Hamid Rezatofighi, Ian Reid

PDF

Open Access

TL;DR

MOLTR is a monocular RGB video-based system that localizes, tracks, and reconstructs multiple objects with semantic understanding, enabling object-centric mapping for robotics and AR/VR.

Contribution

It introduces a novel online method for object-centric mapping using monocular videos, combining 3D detection, shape embedding, and Bayesian filtering.

Findings

01

Superior performance on indoor and outdoor datasets

02

Effective object localization and tracking

03

Progressive shape refinement

Abstract

Semantic aware reconstruction is more advantageous than geometric-only reconstruction for future robotic and AR/VR applications because it represents not only where things are, but also what things are. Object-centric mapping is a task to build an object-level reconstruction where objects are separate and meaningful entities that convey both geometry and semantic information. In this paper, we present MOLTR, a solution to object-centric mapping using only monocular image sequences and camera poses. It is able to localise, track, and reconstruct multiple objects in an online fashion when an RGB camera captures a video of the surrounding. Given a new RGB frame, MOLTR firstly applies a monocular 3D detector to localise objects of interest and extract their shape codes that represent the object shapes in a learned embedding space. Detections are then merged to existing objects in the map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage · Advanced Vision and Imaging

MethodsAttentive Walk-Aggregating Graph Neural Network