Large Scale Joint Semantic Re-Localisation and Scene Understanding via Globally Unique Instance Coordinate Regression
Ignas Budvytis, Marvin Teichmann, Tomas Vojir, Roberto Cipolla

TL;DR
This paper introduces a novel joint approach for semantic localisation and scene understanding that predicts 3D geometry and camera pose simultaneously, outperforming existing methods on real and synthetic datasets.
Contribution
It proposes a two-step neural network method for scene coordinate regression that scales to larger maps and integrates object recognition with local coordinate prediction.
Findings
Achieves smaller pose estimation errors than state-of-the-art methods.
Effectively predicts accurate 3D geometry of static objects.
Scales to maps several orders of magnitude larger than previous approaches.
Abstract
In this work we present a novel approach to joint semantic localisation and scene understanding. Our work is motivated by the need for localisation algorithms which not only predict 6-DoF camera pose but also simultaneously recognise surrounding objects and estimate 3D geometry. Such capabilities are crucial for computer vision guided systems which interact with the environment: autonomous driving, augmented reality and robotics. In particular, we propose a two step procedure. During the first step we train a convolutional neural network to jointly predict per-pixel globally unique instance labels and corresponding local coordinates for each instance of a static object (e.g. a building). During the second step we obtain scene coordinates by combining object center coordinates and local coordinates and use them to perform 6-DoF camera pose estimation. We evaluate our approach on real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
