Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

Michael Li; Nishant Subramani

arXiv:2506.02132·cs.CL·April 23, 2026

Model Internal Sleuthing: Finding Lexical Identity and Inflectional Features in Modern Language Models

Michael Li, Nishant Subramani

PDF

1 Repo

TL;DR

This study systematically probes 25 transformer models to understand how they encode lexical identity and inflectional features across multiple languages, revealing consistent patterns in their internal representations.

Contribution

It provides the first comprehensive analysis of how modern language models encode lexical and inflectional information across diverse languages and model sizes.

Findings

01

Inflectional features are linearly decodable throughout models.

02

Lexical identity is prominent early but weakens with depth.

03

Models with aggressive dimensionality compression show reduced steering effectiveness.

Abstract

Large transformer-based language models dominate modern NLP, yet our understanding of how they encode linguistic information relies primarily on studies of early models like BERT and GPT-2. We systematically probe 25 models from BERT Base to Qwen2.5-7B focusing on two linguistic properties: lexical identity and inflectional features across 6 diverse languages. We find a consistent pattern: inflectional features are linearly decodable throughout the model, while lexical identity is prominent early but increasingly weakens with depth. Further analysis of the representation geometry reveals that models with aggressive mid-layer dimensionality compression show reduced steering effectiveness in those layers, despite probe accuracy remaining high. Pretraining analysis shows that inflectional structure stabilizes early while lexical identity representations continue evolving. Taken together,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml5885/model_internal_sleuthing
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.