GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection
Pingbang Hu, Joseph Melkonian, Weijing Tang, Han Zhao, Jiaqi W. Ma

TL;DR
GraSS introduces a gradient compression method leveraging gradient sparsity to enable scalable, fast data attribution for large models, significantly reducing computational costs while maintaining accuracy.
Contribution
It presents GraSS, a novel gradient compression algorithm that achieves sub-linear complexity for data attribution, with variants optimized for linear layers, enabling scalable influence analysis.
Findings
Achieves up to 165% faster throughput on billion-scale models.
Sub-linear space and time complexity for gradient-based attribution.
Maintains influence fidelity despite compression.
Abstract
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
