# Information-Theoretical Analysis of a Transformer-Based Generative AI Model

**Authors:** Manas Deb, Tokunbo Ogunfunmi

PMC · DOI: 10.3390/e27060589 · Entropy · 2025-05-31

## TL;DR

This paper uses information theory to analyze how Transformer-based AI models process and encode language, revealing new insights into their inner workings.

## Contribution

The study introduces information-theoretical tools to visualize and analyze Transformer layers using information planes and information geometry.

## Key findings

- Information-theoretical analysis reveals how Transformers encode word relationships in high-dimensional space.
- Information geometry provides deeper insights into word relationships than traditional attention scores.
- The approach helps identify and troubleshoot learning issues in Transformer layers.

## Abstract

Large Language models have shown a remarkable ability to “converse” with humans in a natural language across myriad topics. Despite the proliferation of these models, a deep understanding of how they work under the hood remains elusive. The core of these Generative AI models is composed of layers of neural networks that employ the Transformer architecture. This architecture learns from large amounts of training data and creates new content in response to user input. In this study, we analyze the internals of the Transformer using Information Theory. To quantify the amount of information passing through a layer, we view it as an information transmission channel and compute the capacity of the channel. The highlight of our study is that, using Information-Theoretical tools, we develop techniques to visualize on an Information plane how the Transformer encodes the relationship between words in sentences while these words are projected into a high-dimensional vector space. We use Information Geometry to analyze the high-dimensional vectors in the Transformer layer and infer relationships between words based on the length of the geodesic connecting these vector distributions on a Riemannian manifold. Our tools reveal more information about these relationships than attention scores. In this study, we also show how Information-Theoretic analysis can help in troubleshooting learning problems in the Transformer layers.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12191707/full.md

## Figures

45 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12191707/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12191707/full.md

---
Source: https://tomesphere.com/paper/PMC12191707