Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on the TruthfulQA Benchmark
Adeyemi Adeseye, Aisvarya Adeseye, Hannu Tenhunen, Jouni Isoaho

TL;DR
This paper analyzes internal entropy and attention patterns in small language models during decoding to understand their truthfulness and reliability, revealing distinct dynamic patterns linked to model behavior.
Contribution
It introduces a trace-level analysis of entropy and attention dynamics in small language models, connecting internal behavior to output truthfulness and stability.
Findings
Deterministic models show decreasing entropy over time.
Exploratory models exhibit increasing entropy.
Balanced models maintain moderate, stable entropy.
Abstract
Small language models (SLMs) have been increasingly deployed in edge devices and other resource-constrained settings. However, these models make confident mispredictions and produce unstable output, making them risky for factual and decision-critical tasks. Current evaluation methodology relies on final accuracy or hallucination rates without explaining how internal model behavior affects outputs. Specifically, how entropy evolves during decoding, how attention is distributed across layers, and how hidden representations contribute to uncertainty, logical inconsistencies, and misinformation propagation are often overlooked. Consequently, this study introduces a trace-level analysis of entropy and attention dynamics in SLMs evaluated with the TruthfulQA dataset. Four models with parameter ranges of 1B-1.7B parameters were examined via token-level output entropy, attention entropy, head…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
