TAROT: Targeted Data Selection via Optimal Transport
Lan Feng, Fan Nie, Yuejiang Liu, Alexandre Alahi

TL;DR
TAROT introduces a novel data selection method based on optimal transport theory that effectively handles complex, multimodal data distributions, outperforming existing influence-based heuristics across various deep learning tasks.
Contribution
The paper presents TAROT, a new targeted data selection framework that uses whitened feature distance and optimal transport to improve selection in multimodal data scenarios.
Findings
TAROT outperforms state-of-the-art methods in multiple tasks.
It effectively handles multimodal and complex data distributions.
The method provides reliable estimates of optimal selection ratios.
Abstract
We propose TAROT, a targeted data selection framework grounded in optimal transport theory. Previous targeted data selection methods primarily rely on influence-based greedy heuristics to enhance domain-specific performance. While effective on limited, unimodal data (i.e., data following a single pattern), these methods struggle as target data complexity increases. Specifically, in multimodal distributions, these heuristics fail to account for multiple inherent patterns, leading to suboptimal data selection. This work identifies two primary factors contributing to this limitation: (i) the disproportionate impact of dominant feature components in high-dimensional influence estimation, and (ii) the restrictive linear additive assumptions inherent in greedy selection strategies. To address these challenges, TAROT incorporates whitened feature distance to mitigate dominant feature bias,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Fault Detection and Control Systems · Target Tracking and Data Fusion in Sensor Networks
