Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric   Images to Allocentric Semantics with Vision Transformers

Chang Chen; Jiaming Zhang; Kailun Yang; Kunyu Peng; Rainer; Stiefelhagen

arXiv:2207.06205·cs.CV·October 17, 2022

Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

Chang Chen, Jiaming Zhang, Kailun Yang, Kunyu Peng, Rainer, Stiefelhagen

PDF

Open Access 1 Repo

TL;DR

Trans4Map introduces a Transformer-based framework that efficiently converts egocentric images into allocentric semantic maps, outperforming previous models in accuracy and parameter efficiency.

Contribution

The paper presents a novel end-to-end Transformer-based approach with a Bidirectional Allocentric Memory module for holistic mapping from egocentric images.

Findings

01

Achieves state-of-the-art accuracy on Matterport3D dataset.

02

Reduces model parameters by 67.2%.

03

Improves mIoU by 3.25% and mBF1 by 4.09%.

Abstract

Humans have an innate ability to sense their surroundings, as they can extract the spatial representation from the egocentric perception and form an allocentric semantic map via spatial transformation and memory updating. However, endowing mobile agents with such a spatial sensing ability is still a challenge, due to two difficulties: (1) the previous convolutional models are limited by the local receptive field, thus, struggling to capture holistic long-range dependencies during observation; (2) the excessive computational budgets required for success, often lead to a separation of the mapping pipeline into stages, resulting the entire mapping process inefficient. To address these issues, we propose an end-to-end one-stage Transformer-based framework for Mapping, termed Trans4Map. Our egocentric-to-allocentric mapping process includes three steps: (1) the efficient transformer extracts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jamycheung/trans4map
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques