TL;DR
DinoRADE is a radar-camera fusion method using vision foundation model features that improves multi-class object detection, especially for vulnerable road users, in adverse weather conditions.
Contribution
It introduces a Radar-centered detection pipeline with deformable cross-attention and leverages DINOv3 features, achieving superior performance on the K-Radar dataset.
Findings
Outperforms recent Radar-camera approaches by 12.1%.
Provides detection performance for five object classes in all weather conditions.
First to report individual detection performance for VRUs in adverse weather.
Abstract
Reliable and weather-robust perception systems are essential for safe autonomous driving and typically employ multi-modal sensor configurations to achieve comprehensive environmental awareness. While recent automotive FMCW Radar-based approaches achieved remarkable performance on detection tasks in adverse weather conditions, they exhibited limitations in resolving fine-grained spatial details particularly critical for detecting smaller and vulnerable road users (VRUs). Furthermore, existing research has not adequately addressed VRU detection in adverse weather datasets such as K-Radar. We present DinoRADE, a Radar-centered detection pipeline that processes dense Radar tensors and aggregates vision features around transformed reference points in the camera perspective via deformable cross-attention. Vision features are provided by a DINOv3 Vision Foundation Model. We present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
