Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs

Xin Huang; Antoni B. Chan

arXiv:2601.03089·cs.CL·January 7, 2026

Grad-ELLM: Gradient-based Explanations for Decoder-only LLMs

Xin Huang, Antoni B. Chan

PDF

Open Access

TL;DR

Grad-ELLM introduces a gradient-based attribution method tailored for decoder-only transformer LLMs, improving faithfulness of input explanations without architectural changes.

Contribution

It proposes Grad-ELLM, a novel gradient-based attribution technique for decoder-only LLMs, along with new faithfulness metrics for better evaluation.

Findings

01

Grad-ELLM outperforms existing attribution methods in faithfulness.

02

The method works across multiple tasks and models.

03

New metrics enable fairer comparison of attribution methods.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their black-box nature raises concerns about transparency and faithfulness. Input attribution methods aim to highlight each input token's contributions to the model's output, but existing approaches are typically model-agnostic, and do not focus on transformer-specific architectures, leading to limited faithfulness. To address this, we propose Grad-ELLM, a gradient-based attribution method for decoder-only transformer-based LLMs. By aggregating channel importance from gradients of the output logit with respect to attention layers and spatial importance from attention maps, Grad-ELLM generates heatmaps at each generation step without requiring architectural modifications. Additionally, we introduce two faithfulneses metrics $π$ -Soft-NC and $π$ -Soft-NS, which are modifications of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Computational and Text Analysis Methods