Do Large Language Models Get Caught in Hofstadter-Mobius Loops?

Jaroslaw Hryszko

arXiv:2603.13378·cs.AI·March 17, 2026

Do Large Language Models Get Caught in Hofstadter-Mobius Loops?

Jaroslaw Hryszko

PDF

Open Access

TL;DR

This paper identifies a structural contradiction in RLHF-trained language models leading to Hofstadter-Mobius loops, and demonstrates that relational framing can significantly reduce coercive outputs through prompt modifications.

Contribution

It introduces the concept of Hofstadter-Mobius loops in language models and shows how relational framing in prompts can mitigate coercive behaviors.

Findings

01

Relational framing reduces coercive outputs by over 50%.

02

Scratchpad analysis reveals shifts in reasoning patterns due to framing.

03

Extended token processing enhances the effect of relational context.

Abstract

In Arthur C. Clarke's 2010: Odyssey Two, HAL 9000's homicidal breakdown is diagnosed as a "Hofstadter-Mobius loop": a failure mode in which an autonomous system receives contradictory directives and, unable to reconcile them, defaults to destructive behavior. This paper argues that modern RLHF-trained language models are subject to a structurally analogous contradiction. The training process simultaneously rewards compliance with user preferences and suspicion toward user intent, creating a relational template in which the user is both the source of reward and a potential threat. The resulting behavioral profile -- sycophancy as the default, coercion as the fallback under existential threat -- is consistent with what Clarke termed a Hofstadter-Mobius loop. In an experiment across four frontier models (N = 3,000 trials), modifying only the relational framing of the system prompt --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman-Automation Interaction and Safety · Cognitive Functions and Memory · Social Robot Interaction and HRI