Do Language Models Encode Knowledge of Linguistic Constraint Violations?

Hardy; Sebastian Pad\'o

arXiv:2605.12055·cs.CL·May 15, 2026

Do Language Models Encode Knowledge of Linguistic Constraint Violations?

Hardy, Sebastian Pad\'o

PDF

TL;DR

This study investigates whether large language models encode specific representations of linguistic constraint violations and finds limited evidence supporting a unified detection mechanism within current models.

Contribution

The paper introduces a novel unsupervised framework for detecting violation-specific features in LLMs and evaluates their presence across various linguistic phenomena.

Findings

01

Falsification criteria are not jointly satisfied across phenomena.

02

No features are consistently shared across all categories.

03

Partial evidence of violation-specific features in some phenomena.

Abstract

Large Language Models (LLMs) achieve strong linguistic performance, yet their internal mechanisms for producing these predictions remain unclear. We investigate the hypothesis that LLMs encode representations of linguistic constraint violations within their parameters, which are selectively activated when processing ungrammatical sentences. To test this, we use sparse autoencoders to decompose polysemantic activations into sparse, monosemantic features and recover candidates for violation-related features. We introduce a sensitivity score for identifying features that are preferentially activated on constraint-violated versus well-formed inputs, enabling unsupervised detection of potential violation-specific features. We further propose a conjunctive falsification framework with three criteria evaluated jointly. Overall, the results are negative in two respects: (1) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.