Challenges of Multi-Modal Coreset Selection for Depth Prediction
Viktor Moskvoretskii, Narek Alvandian

TL;DR
This paper explores the challenges of applying coreset selection techniques to multimodal data for depth prediction, emphasizing the need for specialized methods to handle inter-modal relationships.
Contribution
It adapts a state-of-the-art coreset selection method to multimodal data and analyzes the challenges in extending unimodal algorithms to these complex scenarios.
Findings
Multimodal coreset selection faces unique challenges compared to unimodal cases.
Embedding aggregation and dimensionality reduction impact coreset effectiveness.
Specialized methods are needed to better capture inter-modal relationships.
Abstract
Coreset selection methods are effective in accelerating training and reducing memory requirements but remain largely unexplored in applied multimodal settings. We adapt a state-of-the-art (SoTA) coreset selection technique for multimodal data, focusing on the depth prediction task. Our experiments with embedding aggregation and dimensionality reduction approaches reveal the challenges of extending unimodal algorithms to multimodal scenarios, highlighting the need for specialized methods to better capture inter-modal relationships.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection
