First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation
Dmytro Vitel, Anshuman Chhabra

TL;DR
This paper challenges previous assumptions by showing that middle layers of large language models are more effective than first or last layers for influence estimation, and introduces improved aggregation and evaluation methods.
Contribution
The work provides theoretical and empirical evidence that middle layers outperform first layers for influence estimation and proposes new aggregation and evaluation techniques.
Findings
Middle layers are better influence estimators than first layers.
Alternative aggregation methods improve influence score performance.
The Noise Detection Rate (NDR) effectively evaluates influence scores.
Abstract
Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model via its first-order and higher-order gradient terms. However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers to ensure computational feasibility. Prior seminal work by Yeh et al. (2022) in assessing which layers are best suited for computing language data influence concluded that the first (embedding) layers are the most informative for this purpose, using a hypothesis based on influence scores canceling out (i.e., the cancellation effect).…
Peer Reviews
Decision·ICLR 2026 Poster
- Addresses a practical question (which layers to use and how to aggregate) with simple, general aggregation operators (Rank/Vote). - Provides broad empirical evaluation across models/tasks and includes no-retrain proxies (NDR/AUC) that are useful in practice.
- **Related-work positioning should be strengthened.** Add a concise paragraph clarifying scope vs. **knowledge editing** (e.g., ROME, MEND, MEMIT): Explicitly discuss the differences and connections in terms of **“where” (layers/locations)** and **“how” (locality vs. cross-layer aggregation)**. **No new experiments are required**—a brief positioning and citations will suffice. - **Novelty is relatively simple.** (Aggregation is straightforward; theory is light.)
1. The paper is clearly written and well organized, making it pleasant and easy to follow. 2. The work challenges the established conclusion [1] that the embedding layer is the most informative for LLM data influence estimation. The authors provide both theoretical and empirical analyses showing that the "gradient cancellation effect" can be unreliable in practice, and offer a counterexample (Theorem 5.1). This contributes a fresh perspective and theoretical insight to the field of LLM interpre
1. The current baselines are mostly classical. Comparing with more recent influence estimation approaches would make the study more comprehensive and convincing. 2. Some recent studies [2] restrict gradient computation to specific layers for efficiency reasons. Computing influence across all layers in large LLMs could be computationally expensive. A discussion or quantitative analysis of the computational cost of the proposed approach would strengthen the paper. 3. The current analysis primari
1. The paper provides both theoretical (Theorem 5.1) and empirical evidence challenging prior assumptions about optimal layers for influence estimation. 2. The paper introduces well-motivated aggregation strategies (Rank and Vote) that outperform standard averaging, revealing layer-specific behaviors and improving influence estimation performance.
**1. Limited Evaluation Setting**: The experiments rely solely on synthetically injected label noise (20% uniform flipping) on GLUE benchmarks, which may not reflect real-world data quality issues. **2. Inconsistent Results Across Models**: The findings show notable inconsistencies, particularly for LLaMA-3.2 1B where influence functions fail to outperform random filtering, and the best-performing layers vary across models, suggesting the conclusions may not generalize.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Explainable Artificial Intelligence (XAI)
