The Algorithmic Unconscious: Structural Mechanisms and Implicit Biases in Large Language Models
Philippe Boisnard

TL;DR
This paper introduces the concept of the algorithmic unconscious in large language models, highlighting how structural mechanisms like tokenization and alignment inherently produce biases independent of training data or human intent.
Contribution
It identifies infrastructural biases arising from model mechanisms, providing empirical analysis and proposing a framework for auditing and addressing these biases.
Findings
Arabic tokenization inflates token counts 1.6x to 4x compared to English.
Structural mechanisms like tokenization and alignment induce measurable biases.
Infrastructural biases affect inference costs, contextual access, and model representations.
Abstract
This article introduces the concept of the algorithmic unconscious to designate the set of structural determinations that operate within large language models (LLMs) without being accessible either to the model's own reflexivity or to that of its users. In contrast to approaches that reduce AI bias solely to dataset composition or to the projection of human intentionality, we argue that a significant class of biases emerges directly from the technical mechanisms of the models themselves: tokenization, attention, statistical optimization, and alignment procedures. By framing bias as an infrastructural phenomenon, this approach resolves a central theoretical ambiguity surrounding responsibility, neutrality, and correction in contemporary LLMs. Based on a comparative analysis of tokenization across a corpus of parallel sentences, we show that Arabic languages (Modern Standard Arabic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Language and cultural evolution · Computational and Text Analysis Methods
