First is Better Than Last for Language Data Influence
Chih-Kuan Yeh, Ankur Taly, Mukund Sundararajan, Frederick Liu, Pradeep, Ravikumar

TL;DR
This paper introduces TracIn-WE, a method that improves influence estimation for NLP models by operating on word embeddings instead of last-layer weights, reducing cancellation effects and enhancing interpretability.
Contribution
The paper proposes TracIn-WE, a novel influence measure that operates on embedding layers, addressing cancellation issues and improving influence detection in large language models.
Findings
TracIn-WE outperforms last-layer influence methods on deletion metrics.
It provides influence scores at the word level within training examples.
The method is effective across multiple NLP classification tasks.
Abstract
The ability to identify influential training examples enables us to debug training data and explain model behavior. Existing techniques to do so are based on the flow of training data influence through the model parameters. For large models in NLP applications, it is often computationally infeasible to study this flow through all model parameters, therefore techniques usually pick the last layer of weights. However, we observe that since the activation connected to the last layer of weights contains "shared logic", the data influenced calculated via the last layer weights prone to a ``cancellation effect'', where the data influence of different examples have large magnitude that contradicts each other. The cancellation effect lowers the discriminative power of the influence score, and deleting influential examples according to this measure often does not change the model's behavior by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
