Rethinking the Understanding Ability across LLMs through Mutual Information

Shaojie Wang; Sirui Ding; Na Zou

arXiv:2505.23790·cs.CL·June 2, 2025

Rethinking the Understanding Ability across LLMs through Mutual Information

Shaojie Wang, Sirui Ding, Na Zou

PDF

Open Access

TL;DR

This paper introduces an information-theoretic framework based on mutual information to evaluate and enhance the intrinsic linguistic understanding of large language models, revealing differences between model types and benefits of fine-tuning.

Contribution

It formalizes language understanding as mutual information between input and latent representations, derives a computable lower bound for token-level MI, and empirically compares models and fine-tuning effects.

Findings

01

Encoder-only models retain higher mutual information than decoder-only models.

02

Decoder-only models show a late-layer 'forgetting' pattern in mutual information.

03

Fine-tuning to maximize token recoverability improves understanding ability.

Abstract

Recent advances in large language models (LLMs) have revolutionized natural language processing, yet evaluating their intrinsic linguistic understanding remains challenging. Moving beyond specialized evaluation tasks, we propose an information-theoretic framework grounded in mutual information (MI) to achieve this. We formalize the understanding as MI between an input sentence and its latent representation (sentence-level MI), measuring how effectively input information is preserved in latent representation. Given that LLMs learn embeddings for individual tokens, we decompose sentence-level MI into token-level MI between tokens and sentence embeddings, establishing theoretical bounds connecting these measures. Based on this foundation, we theoretically derive a computable lower bound for token-level MI using Fano's inequality, which directly relates to token-level recoverability-the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law