Learned Multimodal Compression for Autonomous Driving
Hadi Hadizadeh, Ivan V. Baji\'c

TL;DR
This paper investigates learned multimodal compression techniques for autonomous driving sensors, focusing on camera and LiDAR data, and demonstrates that joint coding of fused modalities improves 3D object detection performance.
Contribution
It introduces and compares several learned coding schemes for multimodal data, highlighting the effectiveness of joint coding for autonomous driving applications.
Findings
Joint coding outperforms other schemes in accuracy.
Fused modality coding improves 3D detection results.
Experimental validation on nuScenes dataset confirms effectiveness.
Abstract
Autonomous driving sensors generate an enormous amount of data. In this paper, we explore learned multimodal compression for autonomous driving, specifically targeted at 3D object detection. We focus on camera and LiDAR modalities and explore several coding approaches. One approach involves joint coding of fused modalities, while others involve coding one modality first, followed by conditional coding of the other modality. We evaluate the performance of these coding schemes on the nuScenes dataset. Our experimental results indicate that joint coding of fused modalities yields better results compared to the alternatives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsFocus
