Taming AI Bots: Controllability of Neural States in Large Language   Models

Stefano Soatto; Paulo Tabuada; Pratik Chaudhari; Tian Yu Liu

arXiv:2305.18449·cs.AI·May 31, 2023·1 cites

Taming AI Bots: Controllability of Neural States in Large Language Models

Stefano Soatto, Paulo Tabuada, Pratik Chaudhari, Tian Yu Liu

PDF

Open Access

TL;DR

This paper investigates the controllability of large language models, demonstrating that, under certain conditions, an AI bot can reach any meaningful state, raising both opportunities and risks for steering AI behavior.

Contribution

It provides a formal framework for understanding AI bot controllability, characterizes the space of reachable meanings, and establishes conditions for almost certain state reachability.

Findings

01

AI bots are controllable within the meaning space.

02

A well-trained LLM can reach any meaning with small probability.

03

Controllability poses both risks and opportunities for AI safety.

Abstract

We tackle the question of whether an agent can, by suitable choice of prompts, control an AI bot to any state. To that end, we first introduce a formal definition of ``meaning'' that is amenable to analysis. Then, we characterize ``meaningful data'' on which large language models (LLMs) are ostensibly trained, and ``well-trained LLMs'' through conditions that are largely met by today's LLMs. While a well-trained LLM constructs an embedding space of meanings that is Euclidean, meanings themselves do not form a vector (linear) subspace, but rather a quotient space within. We then characterize the subset of meanings that can be reached by the state of the LLMs for some input prompt, and show that a well-trained bot can reach any meaning albeit with small probability. We then introduce a stronger notion of controllability as {\em almost certain reachability}, and show that, when restricted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)