Challenges of Multi-Modal Coreset Selection for Depth Prediction

Viktor Moskvoretskii; Narek Alvandian

arXiv:2502.15834·cs.LG·February 25, 2025

Challenges of Multi-Modal Coreset Selection for Depth Prediction

Viktor Moskvoretskii, Narek Alvandian

PDF

Open Access 1 Repo

TL;DR

This paper explores the challenges of applying coreset selection techniques to multimodal data for depth prediction, emphasizing the need for specialized methods to handle inter-modal relationships.

Contribution

It adapts a state-of-the-art coreset selection method to multimodal data and analyzes the challenges in extending unimodal algorithms to these complex scenarios.

Findings

01

Multimodal coreset selection faces unique challenges compared to unimodal cases.

02

Embedding aggregation and dimensionality reduction impact coreset effectiveness.

03

Specialized methods are needed to better capture inter-modal relationships.

Abstract

Coreset selection methods are effective in accelerating training and reducing memory requirements but remain largely unexplored in applied multimodal settings. We adapt a state-of-the-art (SoTA) coreset selection technique for multimodal data, focusing on the depth prediction task. Our experiments with embedding aggregation and dimensionality reduction approaches reveal the challenges of extending unimodal algorithms to multimodal scenarios, highlighting the need for specialized methods to better capture inter-modal relationships.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VityaVitalich/MultiModalCoreset
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection