Eigen-Value: Efficient Domain-Robust Data Valuation via Eigenvalue-Based Approach
Youngjun Choi, Joonseong Kang, Sungjun Lim, Kyungwoo Song

TL;DR
The paper introduces Eigen-Value (EV), a spectral method for data valuation that enhances out-of-distribution robustness using only in-distribution data, with low computational costs.
Contribution
EV provides a novel spectral approximation of domain discrepancy and estimates data point contributions efficiently, improving OOD robustness without extra training.
Findings
EV achieves better OOD robustness in real-world datasets.
EV maintains stable data value rankings across domain shifts.
EV is computationally lightweight and practical for large-scale applications.
Abstract
Data valuation has become central in the era of data-centric AI. It drives efficient training pipelines and enables objective pricing in data markets by assigning a numeric value to each data point. Most existing data valuation methods estimate the effect of removing individual data points by evaluating changes in model validation performance under in-distribution (ID) settings, as opposed to out-of-distribution (OOD) scenarios where data follow different patterns. Since ID and OOD data behave differently, data valuation methods based on ID loss often fail to generalize to OOD settings, particularly when the validation set contains no OOD data. Furthermore, although OOD-aware methods exist, they involve heavy computational costs, which hinder practical deployment. To address these challenges, we introduce \emph{Eigen-Value} (EV), a plug-and-play data valuation framework for OOD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
