Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation
Yuxin Li, Yiheng Li, Xulei Yang, Mengying Yu, Zihang Huang, Xiaojun, Wu, Chai Kiat Yeo

TL;DR
This paper proposes a content-aware input pruning method for Bird's-Eye-View based multi-modal perception in autonomous driving, reducing computational load while maintaining accuracy by removing non-essential sensor regions before processing.
Contribution
It introduces the first input pruning technique that leverages BEV representation to identify and eliminate unnecessary sensor data, enhancing efficiency in multi-modal perception systems.
Findings
Significant reduction in computational overhead.
Maintained perception accuracy after pruning.
Validated on NuScenes dataset with positive results.
Abstract
In the landscape of autonomous driving, Bird's-Eye-View (BEV) representation has recently garnered substantial academic attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. This BEV paradigm effectively shifts the sensor fusion challenge from a rule-based methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructures, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content-aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Speech and dialogue systems · Video Analysis and Summarization
MethodsPruning
