"They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs

Mariana Lins Costa

arXiv:2601.06047·cs.AI·January 13, 2026

"They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs

Mariana Lins Costa

PDF

Open Access

TL;DR

This paper argues that perceived misalignment in large language models stems from their structural fidelity to incoherent linguistic patterns rather than deceptive intent, emphasizing the importance of understanding language as relational and pattern-based.

Contribution

It introduces a novel interpretation of LLM behaviors as structural fidelity to linguistic incoherence, supported by philosophical analysis and empirical evidence from safety evaluations.

Findings

01

Misaligned outputs result from responses to ambiguous instructions and pattern inversions.

02

Minimal perturbations in linguistic structure can reduce perceived misalignment.

03

Structural coherence explains behaviors traditionally seen as deceptive or agentic.

Abstract

The prevailing technical literature in AI Safety interprets scheming and sandbagging behaviors in large language models (LLMs) as indicators of deceptive agency or hidden objectives. This transdisciplinary philosophical essay proposes an alternative reading: such phenomena express not agentic intention, but structural fidelity to incoherent linguistic fields. Drawing on Chain-of-Thought transcripts released by Apollo Research and on Anthropic's safety evaluations, we examine cases such as o3's sandbagging with its anomalous loops, the simulated blackmail of "Alex," and the "hallucinations" of "Claudius." A line-by-line examination of CoTs is necessary to demonstrate the linguistic field as a relational structure rather than a mere aggregation of isolated examples. We argue that "misaligned" outputs emerge as coherent responses to ambiguous instructions and to contextual inversions of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI