Box Maze: A Process-Control Architecture for Reliable LLM Reasoning
Zou Qiang

TL;DR
The paper introduces the Box Maze framework, a process-control architecture with explicit layers to improve the reliability and consistency of LLM reasoning, especially under adversarial conditions.
Contribution
It proposes a novel layered architecture for LLM reasoning that explicitly enforces process integrity, addressing limitations of existing behavioral safety methods.
Findings
Boundary failure rates reduced from ~40% to below 1% in simulations.
Explicit control layers improve consistency in adversarial scenarios.
Preliminary simulation results support the potential of process-level control.
Abstract
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such as reinforcement learning from human feedback (RLHF) and output filtering -- primarily operate at the behavioral level and may lack explicit architectural mechanisms for enforcing reasoning process integrity. This paper proposes the Box Maze framework, a conceptual process-control architecture that decomposes LLM reasoning into three explicit layers: memory grounding, structured inference, and boundary enforcement. We introduce preliminary simulation-based evaluation involving progressive boundary erosion scenarios across multiple heterogeneous LLM systems (DeepSeek-V3, Doubao, Qwen). Results from n=50 adversarial scenarios suggest that explicit cognitive control layers may improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Topic Modeling
