GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry

Guanghui Min; Tianhao Huang; Ke Wan; Chen Chen

arXiv:2602.18584·cs.LG·May 19, 2026

GIST: Targeted Data Selection for Instruction Tuning via Coupled Optimization Geometry

Guanghui Min, Tianhao Huang, Ke Wan, Chen Chen

PDF

1 Repo

TL;DR

GIST introduces a subspace alignment method for targeted data selection in instruction tuning, effectively capturing parameter coupling and improving efficiency over existing approaches.

Contribution

The paper proposes GIST, a novel subspace-based data selection method that accounts for cross-parameter interactions in PEFT, outperforming axis-aligned influence measures.

Findings

01

GIST matches or outperforms state-of-the-art baselines.

02

GIST requires only 0.29% of storage and 25% of computational time.

03

GIST effectively captures low-dimensional task-relevant update directions.

Abstract

Targeted data selection has emerged as a crucial paradigm for efficient instruction tuning, aiming to identify a small yet influential subset of training examples for a specific target task. In practice, influence is often measured through the effect of an example on parameter updates. To make selection scalable, many approaches leverage optimizer statistics (e.g., Adam states) as an axis-aligned surrogate for update geometry (i.e., diagonal precondition), implicitly treating parameters as coordinate-wise independent. We show that this assumption breaks down in parameter-efficient fine-tuning (PEFT) methods such as LoRA. In this setting, the induced optimization geometry exhibits strong cross-parameter coupling with non-trivial off-diagonal interactions, while the task-relevant update directions are confined to a low-dimensional subspace. Motivated by this mismatch, we propose GIST…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guanghuimin/GIST
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning