Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Yijun Dong, Hoang Phan, Xiang Pan, Qi Lei

TL;DR
This paper introduces Sketchy Moment Matching (SkMM), a scalable data selection method for finetuning that balances bias and variance reduction through gradient sketching and moment matching, with theoretical guarantees and empirical validation.
Contribution
The paper proposes SkMM, a novel two-stage data selection approach that efficiently balances bias and variance in high-dimensional finetuning with provable guarantees.
Findings
SkMM preserves fast-rate generalization independent of parameter dimension.
Gradient sketching is fast and provably accurate for bias control.
Empirical results show SkMM improves finetuning performance on vision tasks.
Abstract
We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace ; (ii) then the variance is reduced over via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting samples by reducing variance over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Human Motion and Animation
