TL;DR
This paper introduces DASGIL, a multi-task model that fuses semantic and geometric features for robust image-based localization across different environments, using adversarial domain adaptation from synthetic to real-world data.
Contribution
It presents a novel multi-task architecture with a multi-scale feature discriminator for domain adaptation in visual localization tasks.
Findings
Outperforms state-of-the-art methods on CMU-Seasons and Oxford RobotCar datasets.
Effectively adapts from synthetic to real-world environments without human-annotated ground truths.
Enhances large-scale place recognition under challenging environmental variations.
Abstract
Long-Term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics due to season, illumination variance, etc. Image retrieval for localization is an efficient and effective solution to the problem. In this paper, we propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition. To use the high-quality ground truths without any human effort, the effective multi-scale feature discriminator is proposed for adversarial training to achieve the domain adaptation from synthetic virtual KITTI dataset to real-world KITTI dataset. The proposed approach is validated on the Extended CMU-Seasons dataset and Oxford RobotCar dataset through a series of crucial comparison experiments, where our performance outperforms state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
