Quantifying Semantic Emergence in Language Models
Hang Chen, Xinyu Yang, Jiaying Zhu, Wenya Wang

TL;DR
This paper introduces a new metric called Information Emergence (IE) to quantify how well large language models extract meaningful semantics from input tokens, providing insights into their semantic understanding capabilities.
Contribution
The paper proposes a novel, task- and architecture-agnostic metric, Information Emergence, for measuring semantic extraction in language models, along with a lightweight estimator for mutual information.
Findings
IE reveals patterns consistent with linguistic knowledge
Some IE patterns are unexpected, offering new insights
Experiments validate IE's informativeness in different contexts
Abstract
Large language models (LLMs) are widely recognized for their exceptional capacity to capture semantics meaning. Yet, there remains no established metric to quantify this capability. In this work, we introduce a quantitative metric, Information Emergence (IE), designed to measure LLMs' ability to extract semantics from input tokens. We formalize ``semantics'' as the meaningful information abstracted from a sequence of tokens and quantify this by comparing the entropy reduction observed for a sequence of tokens (macro-level) and individual tokens (micro-level). To achieve this, we design a lightweight estimator to compute the mutual information at each transformer layer, which is agnostic to different tasks and language model architectures. We apply IE in both synthetic in-context learning (ICL) scenarios and natural sentence contexts. Experiments demonstrate informativeness and patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Topic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Layer Normalization · Discriminative Fine-Tuning · Multi-Head Attention · Dense Connections · Cosine Annealing · Attention Dropout · Weight Decay
