Radar Spectra-Language Model for Automotive Scene Parsing

Mariia Pushkareva; Yuri Feldman; Csaba Domokos; Kilian Rambach; Dotan; Di Castro

arXiv:2406.02158·cs.CV·August 12, 2024

Radar Spectra-Language Model for Automotive Scene Parsing

Mariia Pushkareva, Yuri Feldman, Csaba Domokos, Kilian Rambach, Dotan, Di Castro

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a radar spectra-language model that enhances interpretability and scene understanding in autonomous driving by enabling free-text querying of radar spectra, improving tasks like scene retrieval, free space segmentation, and object detection.

Contribution

We develop a novel radar spectra-language model that leverages vision-language embeddings to interpret radar spectra and improve scene perception in autonomous driving.

Findings

01

Improved free space segmentation using radar spectra embeddings.

02

Enhanced object detection performance with spectra-based features.

03

Effective querying of radar spectra for scene elements using natural language.

Abstract

Radar sensors are low cost, long-range, and weather-resilient. Therefore, they are widely used for driver assistance functions, and are expected to be crucial for the success of autonomous driving in the future. In many perception tasks only pre-processed radar point clouds are considered. In contrast, radar spectra are a raw form of radar measurements and contain more information than radar point clouds. However, radar spectra are rather difficult to interpret. In this work, we aim to explore the semantic information contained in spectra in the context of automated driving, thereby moving towards better interpretability of radar spectra. To this end, we create a radar spectra-language model, allowing us to query radar spectra measurements for the presence of scene elements using free text. We overcome the scarcity of radar spectra data by matching the embedding space of an existing…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

This paper introduces the text information into feature fusion for radar spectra interpretability.

Weaknesses

1. The framework seems to be a simple combination of existing methods. I didn’t see the specific design for the radar spectra language model. 2. The experiment of detection is not compared with SOTA methods such as RODNet. 3. What is [20] in Table 3? 4. If the description includes multiple object information, how do you align the text information with the corresponding object?

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The radar spectrum pre-training to optimize on similarity to fine-tuned OpenCLIP is novel. It allows for pre-training without a need for explicit Radar-spectra dataset.

Weaknesses

No discussion on what is still hard to do or not reliable. Also analysis of the varying the difficulty of the input scenes would help answer the previous question.

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. To the best of my knowledge, this is the first paper trying to build a radar spectra-language model. 2. The fine-tuned VLM for autonomous driving scenes works much better than the off-the-shell CLIP. 3. The zero-shot retrieval ability of RSLM is impressive, especially for the small objects such as pedestrian and cyclist.

Weaknesses

1. The author seems to lack paper writing skills. All the figures are unaesthetic bitmaps with low resolution and some of the figures are not necessary. For Figure 4a, it is better to use formulation instead of python code to describe the loss functions. For Figure 4b, such a simple architecture may be put in the supplement material. 2. Changing the position encoding without finetuning may cause performance drop, and splitting the image may break some objects on the edge. A better and more comm

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis