TL;DR
PRIX is an end-to-end autonomous driving model that uses only camera data, employing a novel transformer module to predict safe trajectories efficiently from raw pixels, matching state-of-the-art performance.
Contribution
It introduces PRIX, a camera-only planning architecture with a new CaRT module, achieving high performance without explicit BEV or LiDAR, and is suitable for real-world deployment.
Findings
PRIX matches state-of-the-art on NavSim and nuScenes benchmarks.
PRIX is more efficient in inference speed and model size.
PRIX operates solely on camera data without explicit BEV or LiDAR.
Abstract
While end-to-end autonomous driving models show promising results, their practical deployment is often hindered by large model sizes, a reliance on expensive LiDAR sensors and computationally intensive BEV feature representations. This limits their scalability, especially for mass-market vehicles equipped only with cameras. To address these challenges, we propose PRIX (Plan from Raw Pixels). Our novel and efficient end-to-end driving architecture operates using only camera data, without explicit BEV representation and forgoing the need for LiDAR. PRIX leverages a visual feature extractor coupled with a generative planning head to predict safe trajectories from raw pixel inputs directly. A core component of our architecture is the Context-aware Recalibration Transformer (CaRT), a novel module designed to effectively enhance multi-level visual features for more robust planning. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
