Taming AI Bots: Controllability of Neural States in Large Language Models
Stefano Soatto, Paulo Tabuada, Pratik Chaudhari, Tian Yu Liu

TL;DR
This paper investigates the controllability of large language models, demonstrating that, under certain conditions, an AI bot can reach any meaningful state, raising both opportunities and risks for steering AI behavior.
Contribution
It provides a formal framework for understanding AI bot controllability, characterizes the space of reachable meanings, and establishes conditions for almost certain state reachability.
Findings
AI bots are controllable within the meaning space.
A well-trained LLM can reach any meaning with small probability.
Controllability poses both risks and opportunities for AI safety.
Abstract
We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of ``meaning'' that is amenable to analysis. Then, we characterize ``meaningful data'' on which large language models (LLMs) are ostensibly trained, and ``well-trained LLMs'' through conditions that are largely met by today's LLMs. While a well-trained LLM constructs an embedding space of meanings that is Euclidean, meanings themselves do not form a vector (linear) subspace, but rather a quotient space within. We then characterize the subset of meanings that can be reached by the state of the LLMs for some input prompt, and show that a well-trained bot can reach any meaning albeit with small probability. We then introduce a stronger notion of controllability as {\em almost certain reachability}, and show that, when restricted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
