RelTopo: Multi-Level Relational Modeling for Driving Scene Topology Reasoning
Yueru Luo, Changqing Zhou, Yiming Yang, Erlong Li, Chao Zheng, Shuqi Mei, Shuguang Cui, Zhen Li

TL;DR
RelTopo introduces a multi-level relational modeling framework that jointly enhances perception and topology reasoning in autonomous driving, significantly improving detection and connectivity inference accuracy.
Contribution
It systematically integrates relational cues across perception, reasoning, and supervision levels, a novel approach that jointly optimizes perception and topology reasoning tasks.
Findings
Achieves state-of-the-art results on OpenLane-V2 dataset
Improves detection accuracy by +3.1 in DET_l
Enhances topology reasoning with +5.3 in TOP_ll
Abstract
Accurate road topology reasoning is critical for autonomous driving, as it requires both perceiving road elements and understanding how lanes connect to each other (L2L) and to traffic elements (L2T). Existing methods often focus on either perception or L2L reasoning, leaving L2T underexplored and fall short of jointly optimizing perception and reasoning. Moreover, although topology prediction inherently involves relations, relational modeling itself is seldom incorporated into feature extraction or supervision. As humans naturally leverage contextual relationships to recognize road element and infer their connectivity, we posit that relational modeling can likewise benefit both perception and reasoning, and that these two tasks should be mutually enhancing. To this end, we propose RelTopo, a multi-level relational modeling approach that systematically integrates relational cues across…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The idea of the proposed Cross-View L2T head is interesting and provides valuable insights for future research. 2. The authors achieve state-of-the-art performance on the OpenLane-V2 benchmark and present a comprehensive ablation study. 3. The paper includes extensive visualizations of experimental results, which greatly help readers in analyzing and understanding the method’s effectiveness.
1. What is the difference between the proposed Geometry-Biased Self-Attention and the Geometric-Guided Self-Attention used in TopoFormer (CVPR 2025)? 2. In Section 1 (Introduction), the authors state that existing works suffer from fragmented task optimization, arguing that methods such as TopoMLP (ICLR 2024) fail to jointly optimize perception and reasoning modules. However, in the current design of RelTopo, the Perception Level and Reasoning Level are also implemented separately, and the loss
- The authors employ Geometry-Biased Self-Attention to improve lane perception and introduce a Geometry-Enhanced Module to enhance lane topology reasoning. The proposed approach performs effectively in both tasks. - The experiments in this paper demonstrate superior performance in lane detection, L2L reasoning, and L2T reasoning on the OpenLane-V2 dataset, compared to previous methods, with improvements observed in each of these tasks individually.
- This paper lacks of clear motivation, which I think is very indispensable. - This paper lacks experimental justification for the choice of Bezier curves for modeling and does not provide an explanation for why the Curve-Guided Cross-Attention mechanism is effective. - This paper does not explain the connection between proposed modules, such as geometry-biased SA and curve-guided CA.
1. The designs of all components targeting the perception and relational modeling are reasonable and insightful. The authors design a more effective cross-attention mechanism to aggregate point features and update the query feature, thereby capturing reasonable dependency relationships. 2. Explicit geometrical similarities such as angle similarities and distance embeddings are considered into the feature learning, which can leverage better and full supervision for the model.
1. Many individual novel parts are proposed to solve each sub-problem of the topology prediction task, but the whole work is just combining all of them to achieve better performance; this is still not stepping out of the framework like TopoNet or LaneSegNet, such as using BEV-feature for lane-to-lane connectivity and BEV-FV feature for Lane2traffic connectivity. And finally, similar approaches of using MLPs to predict the relation matrix are not novel. 2. The inner contributions of L2L (Learnin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization · Natural Language Processing Techniques
MethodsContrastive Learning · Focus
