$\texttt{dattri}$: A Library for Efficient Data Attribution
Junwei Deng, Ting-Wei Li, Shiyuan Zhang, Shixuan Liu, Yijun Pan, Hao, Huang, Xinhe Wang, Pingbang Hu, Xingjian Zhang, Jiaqi W. Ma

TL;DR
dattri is an open-source library that simplifies the development, benchmarking, and deployment of data attribution methods for AI models, supporting large-scale models and various benchmark settings.
Contribution
The paper introduces $ exttt{dattri}$, a comprehensive library with a unified API, modular utilities, and benchmarking tools for data attribution in AI.
Findings
Implemented state-of-the-art data attribution methods.
Facilitated comprehensive benchmark analysis.
Supported large-scale neural network models.
Abstract
Data attribution methods aim to quantify the influence of individual training samples on the prediction of artificial intelligence (AI) models. As training data plays an increasingly crucial role in the modern development of large-scale AI models, data attribution has found broad applications in improving AI performance and safety. However, despite a surge of new data attribution methods being developed recently, there lacks a comprehensive library that facilitates the development, benchmarking, and deployment of different data attribution methods. In this work, we introduce , an open-source data attribution library that addresses the above needs. Specifically, highlights three novel design features. Firstly, proposes a unified and easy-to-use API, allowing users to integrate different data attribution methods into their PyTorch-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBanana Cultivation and Research
MethodsLib
