Navigation-Guided Sparse Scene Representation for End-to-End Autonomous   Driving

Peidong Li; Dixiao Cui

arXiv:2409.18341·cs.CV·March 4, 2025

Navigation-Guided Sparse Scene Representation for End-to-End Autonomous Driving

Peidong Li, Dixiao Cui

PDF

Open Access 1 Repo

TL;DR

This paper introduces SSR, a navigation-guided sparse scene representation framework that enhances end-to-end autonomous driving by reducing reliance on expensive annotations and improving efficiency and performance.

Contribution

SSR is a novel framework that uses only 16 navigation tokens for scene representation, eliminating the need for supervised perception tasks and improving real-time driving performance.

Findings

01

27.2% reduction in L2 error compared to baseline

02

51.6% decrease in collision rate in nuScenes

03

10.9x faster inference speed

Abstract

End-to-End Autonomous Driving (E2EAD) methods typically rely on supervised perception tasks to extract explicit scene information (e.g., objects, maps). This reliance necessitates expensive annotations and constrains deployment and data scalability in real-time applications. In this paper, we introduce SSR, a novel framework that utilizes only 16 navigation-guided tokens as Sparse Scene Representation, efficiently extracting crucial scene information for E2EAD. Our method eliminates the need for human-designed supervised sub-tasks, allowing computational resources to concentrate on essential elements directly related to navigation intent. We further introduce a temporal enhancement module, aligning predicted future scenes with actual future scenes through self-supervision. SSR achieves a 27.2\% relative reduction in L2 error and a 51.6\% decrease in collision rate to UniAD in nuScenes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peidongli/ssr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications · Human-Automation Interaction and Safety

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings