Translating Images into Maps

Avishkar Saha; Oscar Mendez Maldonado; Chris Russell; Richard Bowden

arXiv:2110.00966·cs.CV·March 31, 2022

Translating Images into Maps

Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel transformer-based method for converting images into top-down maps in real-time, achieving state-of-the-art results on large-scale datasets by framing map generation as a sequence translation problem.

Contribution

The authors propose a constrained transformer network that models image-to-map translation as a sequence-to-sequence problem, leveraging physical assumptions for improved efficiency and accuracy.

Findings

01

Achieved 15% and 30% relative improvements on nuScenes and Argoverse datasets.

02

Developed a convolutional, sequence-based transformer architecture for image-to-map translation.

03

Demonstrated state-of-the-art performance in instantaneous mapping tasks.

Abstract

We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through the camera location in an overhead map. This lets us formulate map generation from an image as a set of sequence-to-sequence translations. Posing the problem as translation allows the network to use the context of the image when interpreting the role of each pixel. This constrained formulation, based upon a strong physical grounding of the problem, leads to a restricted transformer network that is convolutional in the horizontal direction only. The structure allows us to make efficient use of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

avishkarsaha/translating-images-into-maps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Multimodal Machine Learning Applications