Empirical influence functions to understand the logic of fine-tuning
Jordan K. Matelsky, Lyle Ungar, Konrad P. Kording

TL;DR
This paper introduces an empirical method to measure how individual training samples influence neural network outputs during fine-tuning, revealing limitations in current models' ability to generalize and reason logically.
Contribution
It presents a practical approach to quantify influence in neural networks, highlighting violations of desired influence properties in both simple and modern models.
Findings
Influence measures reveal violations of logical and semantic properties in models
Prompting can partially mitigate influence violations
Popular models struggle with generalization and logical reasoning
Abstract
Understanding the process of learning in neural networks is crucial for improving their performance and interpreting their behavior. This can be approximately understood by asking how a model's output is influenced when we fine-tune on a new training sample. There are desiderata for such influences, such as decreasing influence with semantic distance, sparseness, noise invariance, transitive causality, and logical consistency. Here we use the empirical influence measured using fine-tuning to demonstrate how individual training samples affect outputs. We show that these desiderata are violated for both for simple convolutional networks and for a modern LLM. We also illustrate how prompting can partially rescue this failure. Our paper presents an efficient and practical way of quantifying how well neural networks learn from fine-tuning stimuli. Our results suggest that popular models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Explainable Artificial Intelligence (XAI) · Neural Networks and Reservoir Computing
