Learning Content-Aware Multi-Modal Joint Input Pruning via   Bird's-Eye-View Representation

Yuxin Li; Yiheng Li; Xulei Yang; Mengying Yu; Zihang Huang; Xiaojun; Wu; Chai Kiat Yeo

arXiv:2410.07268·cs.CV·October 11, 2024

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Yuxin Li, Yiheng Li, Xulei Yang, Mengying Yu, Zihang Huang, Xiaojun, Wu, Chai Kiat Yeo

PDF

Open Access

TL;DR

This paper proposes a content-aware input pruning method for Bird's-Eye-View based multi-modal perception in autonomous driving, reducing computational load while maintaining accuracy by removing non-essential sensor regions before processing.

Contribution

It introduces the first input pruning technique that leverages BEV representation to identify and eliminate unnecessary sensor data, enhancing efficiency in multi-modal perception systems.

Findings

01

Significant reduction in computational overhead.

02

Maintained perception accuracy after pruning.

03

Validated on NuScenes dataset with positive results.

Abstract

In the landscape of autonomous driving, Bird's-Eye-View (BEV) representation has recently garnered substantial academic attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. This BEV paradigm effectively shifts the sensor fusion challenge from a rule-based methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructures, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content-aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Speech and dialogue systems · Video Analysis and Summarization

MethodsPruning