TL;DR
GIST introduces a subspace alignment method for targeted data selection in instruction tuning, effectively capturing parameter coupling and improving efficiency over existing approaches.
Contribution
The paper proposes GIST, a novel subspace-based data selection method that accounts for cross-parameter interactions in PEFT, outperforming axis-aligned influence measures.
Findings
GIST matches or outperforms state-of-the-art baselines.
GIST requires only 0.29% of storage and 25% of computational time.
GIST effectively captures low-dimensional task-relevant update directions.
Abstract
Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured through the effect of an example on parameter updates. To make selection scalable, many approaches leverage optimizer statistics (e.g., Adam states) as an axis-aligned surrogate for update geometry (i.e., diagonal precondition), implicitly treating parameters as coordinate-wise independent. We show that this assumption breaks down in parameter-efficient fine-tuning (PEFT) methods such as LoRA. In this setting, the induced optimization geometry exhibits strong cross-parameter coupling with non-trivial off-diagonal interactions, while the task-relevant update directions are confined to a low-dimensional subspace. Motivated by this mismatch, we propose GIST…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
