Learned Multimodal Compression for Autonomous Driving

Hadi Hadizadeh; Ivan V. Baji\'c

arXiv:2408.08211·eess.IV·August 16, 2024

Learned Multimodal Compression for Autonomous Driving

Hadi Hadizadeh, Ivan V. Baji\'c

PDF

Open Access

TL;DR

This paper investigates learned multimodal compression techniques for autonomous driving sensors, focusing on camera and LiDAR data, and demonstrates that joint coding of fused modalities improves 3D object detection performance.

Contribution

It introduces and compares several learned coding schemes for multimodal data, highlighting the effectiveness of joint coding for autonomous driving applications.

Findings

01

Joint coding outperforms other schemes in accuracy.

02

Fused modality coding improves 3D detection results.

03

Experimental validation on nuScenes dataset confirms effectiveness.

Abstract

Autonomous driving sensors generate an enormous amount of data. In this paper, we explore learned multimodal compression for autonomous driving, specifically targeted at 3D object detection. We focus on camera and LiDAR modalities and explore several coding approaches. One approach involves joint coding of fused modalities, while others involve coding one modality first, followed by conditional coding of the other modality. We evaluate the performance of these coding schemes on the nuScenes dataset. Our experimental results indicate that joint coding of fused modalities yields better results compared to the alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems

MethodsFocus