Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
Dominik Schwarz

TL;DR
This paper identifies 41 risk patterns in LLM pipelines caused by unvalidated trust between stages, highlighting architectural vulnerabilities and proposing zero-trust principles and a blueprint for mitigation.
Contribution
It introduces a taxonomy of cross-stage vulnerabilities in LLMs and proposes a novel zero-trust architectural framework with the Countermind blueprint.
Findings
41 recurring risk patterns identified in LLM pipelines
String-level filtering is insufficient to prevent vulnerabilities
Proposed zero-trust principles improve security in LLM architectures
Abstract
As Large Language Models (LLMs) are increasingly integrated into automated, multi-stage pipelines, risk patterns that arise from unvalidated trust between processing stages become a practical concern. This paper presents a mechanism-centered taxonomy of 41 recurring risk patterns in commercial LLMs. The analysis shows that inputs are often interpreted non-neutrally and can trigger implementation-shaped responses or unintended state changes even without explicit commands. We argue that these behaviors constitute architectural failure modes and that string-level filtering alone is insufficient. To mitigate such cross-stage vulnerabilities, we recommend zero-trust architectural principles, including provenance enforcement, context sealing, and plan revalidation, and we introduce "Countermind" as a conceptual blueprint for implementing these defenses.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Machine Learning in Materials Science · Adversarial Robustness in Machine Learning
