TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Sen Yang; Minyue Jiang; Ziwei Fan; Xiaolu Xie; Xiao Tan; and Yingying Li; Errui Ding; Liang Wang; Jingdong Wang

arXiv:2411.14751·cs.CV·November 25, 2024

TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior

Sen Yang, Minyue Jiang, Ziwei Fan, Xiaolu Xie, Xiao Tan, and Yingying Li, Errui Ding, Liang Wang, Jingdong Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces TopoSD, a perception model that leverages SDMap priors and topology-guided decoding to improve lane segmentation and topology understanding in autonomous driving, outperforming existing methods significantly.

Contribution

The paper proposes a novel approach to incorporate SDMap priors into lane perception models, enhancing long-range perception and topology reasoning in BEV lane segmentation.

Findings

01

Outperforms state-of-the-art methods by +6.7 mAP and +9.1 topology metrics.

02

SDMap noise augmentation improves model robustness.

03

Effective joint prediction of lanes, centrelines, and topology.

Abstract

Recent advances in autonomous driving systems have shifted towards reducing reliance on high-definition maps (HDMaps) due to the huge costs of annotation and maintenance. Instead, researchers are focusing on online vectorized HDMap construction using on-board sensors. However, sensor-only approaches still face challenges in long-range perception due to the restricted views imposed by the mounting angles of onboard cameras, just as human drivers also rely on bird's-eye-view navigation maps for a comprehensive understanding of road structures. To address these issues, we propose to train the perception model to "see" standard definition maps (SDMaps). We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information to improve the bird's eye view (BEV) feature for lane geometry and topology decoding.…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

1. The approach encodes geometry and road types from SDMap into features and integrates these into BEV features for use in the decoder, which improves performance. 2. To explore the mutual influence of topology and geometry, this work introduce a topology-guided self-attention mechanism to aggregate vicinity lane features.

Weaknesses

1. *Performance Drop in Model Combination*: Combining LaneSegNet with P-MapNet results in decreased performance, which is unexpected and requires clarification. An explanation for this discrepancy, particularly given that P-MapNet also employs a cross-attention mechanism, would provide valuable insight into the interaction between the two models. 2. Limited Novelty in SDMap Encoding and Fusion: The methods used for map tokenization and fusion lack significant novelty, with SDMap encoding resemb

Reviewer 02Rating 5Confidence 4

Strengths

1. The writing and presentation of this paper is good. 2. The authors provide detailed ablation studies to show how the proposed SDMap prior fusion and topology-guided decoder improve the performances. 3. The authors recognize the noise issue in SDMap and mitigate the performance degradation through data augmentation during training. 4. The proposed method achieves high performance compared to recent state-of-the-art methods.

Weaknesses

1. The SDMap Prior Fusion section lacks technical innovation. The authors combine two SDMap representation methods to achieve better results, but both methods are derived from previous works: spatial map encoding from P-MapNet and map tokenization from SMERF. The author should explain the differences between the proposed fusion method and the simply combination of P-MapNet and SMERF (for example: (1) using both spatial map encoding and map tokenization as key\values in cross attention; (2) conca

Reviewer 03Rating 5Confidence 5

Strengths

- SDMap is a much more easily accessible map prior compared to HDMap and shows the basic structures of a road network. The introduction of it is intuitive and of great practical value. - The fusion of BEV feature and SDMap priors at different levels is simple but effective, leading to a significant performance gain, as demonstrated in the experiments. - The study on the effect of mis-aligned SDMap is novel, which is a common case due to the SDMap collection methods.

Weaknesses

- Although the metrics have been elevated greatly in OpenLane-V2, the generated online map still seems very terrible and contains **lots of** significant errors, overlaps, and wrong detections, as displayed in the qualitative results on Page 9. It prevents TopoSD from being put into real use. - Although the study on the influence of SDMap error is novel, the experimental results seem contradictory to the claims TopoSD proposes, which makes this section of study ill-defined. Since the TopoSD is r

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutomated Road and Building Extraction · Advanced Neural Network Applications · Video Surveillance and Tracking Methods