TL;DR
Laneformer introduces a transformer-based architecture for lane detection that effectively captures long-range relations and semantic context, achieving state-of-the-art results with minimal latency overhead.
Contribution
The paper proposes a novel lane detection model combining deformable self-attention and object-aware context integration within a transformer framework.
Findings
Achieves 77.1% F1 score on CULane benchmark.
Outperforms CNN-based methods in capturing long-range lane relations.
Demonstrates effective integration of object context improves lane detection accuracy.
Abstract
We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail in incorporating relations of long-range lane points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture that revolutionizes the conventional transformers into better capturing the shape and semantic characteristics of lanes, with minimal overhead in latency. First, coupling with deformable pixel-wise self-attention in the encoder, Laneformer presents two new row and column self-attention operations to efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsSoftmax · Linear Layer
