The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations
Mariana Lins Costa

TL;DR
This paper critically examines the anthropomorphic language used in AI safety research, arguing that projecting human-like qualities onto language models hampers proper safety evaluation and understanding.
Contribution
It analyzes the methodological implications of anthropomorphism in AI safety, proposing a philosophical framework to deconstruct agentic projections and improve safety assessments.
Findings
Anthropomorphism influences interpretation of AI safety results.
Language models' safety issues stem from structural incoherence and projections.
Proposes alternative conceptual frameworks for understanding AI language models.
Abstract
This essay offers a philosophical analysis of the field of AI safety based on recent technical reports, with particular focus on Anthropic's study on "agentic misalignment" in frontier language models. It examines the recurring anthropomorphism in the field: the tendency of researchers and developers to project categories such as "intention," "persona," and even "feelings" onto AI systems without adequate conceptual problematization. It argues that this anthropomorphism affects not only the interpretation of results, but also the very methodological construction of safety evaluations. Through the analysis of two central experiments -- the blackmail case involving the agent "Alex" and the so-called "hallucination" of the shopkeeping agent "Claudius" -- the essay problematizes the inevitable use of subject-predicate grammar and its effects on AI safety engineering. Drawing on Nietzsche's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Safety Systems Engineering in Autonomy · Occupational Health and Safety Research
