MapFM: Foundation Model-Driven HD Mapping with Multi-Task Contextual Learning
Leonid Ivanov, Vasily Yuryev, Dmitry Yudin

TL;DR
MapFM is an advanced end-to-end model that leverages foundation models and multi-task learning to generate high-quality, vectorized HD maps in real-time for autonomous driving, enhancing scene understanding and map accuracy.
Contribution
The paper introduces MapFM, a novel foundation model-driven approach that integrates multi-task learning for improved HD map prediction in autonomous driving.
Findings
Significantly improved feature representation quality.
Enhanced map prediction accuracy through multi-task learning.
Effective online vectorized HD map generation demonstrated.
Abstract
In autonomous driving, high-definition (HD) maps and semantic maps in bird's-eye view (BEV) are essential for accurate localization, planning, and decision-making. This paper introduces an enhanced End-to-End model named MapFM for online vectorized HD map generation. We show significantly boost feature representation quality by incorporating powerful foundation model for encoding camera images. To further enrich the model's understanding of the environment and improve prediction quality, we integrate auxiliary prediction heads for semantic segmentation in the BEV representation. This multi-task learning approach provides richer contextual supervision, leading to a more comprehensive scene representation and ultimately resulting in higher accuracy and improved quality of the predicted vectorized HD maps. The source code is available at https://github.com/LIvanoff/MapFM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques
