First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation

Dmytro Vitel; Anshuman Chhabra

arXiv:2511.04715·cs.CL·January 29, 2026

First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation

Dmytro Vitel, Anshuman Chhabra

PDF

Open Access 3 Reviews

TL;DR

This paper challenges previous assumptions by showing that middle layers of large language models are more effective than first or last layers for influence estimation, and introduces improved aggregation and evaluation methods.

Contribution

The work provides theoretical and empirical evidence that middle layers outperform first layers for influence estimation and proposes new aggregation and evaluation techniques.

Findings

01

Middle layers are better influence estimators than first layers.

02

Alternative aggregation methods improve influence score performance.

03

The Noise Detection Rate (NDR) effectively evaluates influence scores.

Abstract

Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Current training sample influence estimation methods (also known as influence functions) undertake this goal by utilizing information flow through the model via its first-order and higher-order gradient terms. However, owing to the large model sizes of today consisting of billions of parameters, these influence computations are often restricted to some subset of model layers to ensure computational feasibility. Prior seminal work by Yeh et al. (2022) in assessing which layers are best suited for computing language data influence concluded that the first (embedding) layers are the most informative for this purpose, using a hypothesis based on influence scores canceling out (i.e., the cancellation effect).…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- Addresses a practical question (which layers to use and how to aggregate) with simple, general aggregation operators (Rank/Vote). - Provides broad empirical evaluation across models/tasks and includes no-retrain proxies (NDR/AUC) that are useful in practice.

Weaknesses

- **Related-work positioning should be strengthened.** Add a concise paragraph clarifying scope vs. **knowledge editing** (e.g., ROME, MEND, MEMIT): Explicitly discuss the differences and connections in terms of **“where” (layers/locations)** and **“how” (locality vs. cross-layer aggregation)**. **No new experiments are required**—a brief positioning and citations will suffice. - **Novelty is relatively simple.** (Aggregation is straightforward; theory is light.)

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is clearly written and well organized, making it pleasant and easy to follow. 2. The work challenges the established conclusion [1] that the embedding layer is the most informative for LLM data influence estimation. The authors provide both theoretical and empirical analyses showing that the "gradient cancellation effect" can be unreliable in practice, and offer a counterexample (Theorem 5.1). This contributes a fresh perspective and theoretical insight to the field of LLM interpre

Weaknesses

1. The current baselines are mostly classical. Comparing with more recent influence estimation approaches would make the study more comprehensive and convincing. 2. Some recent studies [2] restrict gradient computation to specific layers for efficiency reasons. Computing influence across all layers in large LLMs could be computationally expensive. A discussion or quantitative analysis of the computational cost of the proposed approach would strengthen the paper. 3. The current analysis primari

Reviewer 03Rating 4Confidence 3

Strengths

1. The paper provides both theoretical (Theorem 5.1) and empirical evidence challenging prior assumptions about optimal layers for influence estimation. 2. The paper introduces well-motivated aggregation strategies (Rank and Vote) that outperform standard averaging, revealing layer-specific behaviors and improving influence estimation performance.

Weaknesses

**1. Limited Evaluation Setting**: The experiments rely solely on synthetically injected label noise (20% uniform flipping) on GLUE benchmarks, which may not reflect real-world data quality issues. **2. Inconsistent Results Across Models**: The findings show notable inconsistencies, particularly for LLaMA-3.2 1B where influence functions fail to outperform random filtering, and the best-performing layers vary across models, suggesting the conclusions may not generalize.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Explainable Artificial Intelligence (XAI)