Controlling Large Language Model Agents with Entropic Activation Steering
Nate Rahn, Pierluca D'Oro, Marc G. Bellemare

TL;DR
This paper introduces Entropic Activation Steering (EAST), a novel method to control exploration in large language model agents by manipulating their internal representations, leading to better guidance of their decision-making behaviors.
Contribution
EAST provides a new representation-level approach to steer LLM agents' exploration and uncertainty, outperforming token-level methods and generalizing across tasks.
Findings
EAST effectively manipulates exploration by affecting high-level actions.
Applying EAST modulates the agent's uncertainty and exploratory actions.
Steering vectors from EAST generalize across different task variants.
Abstract
The rise of large language models (LLMs) has prompted increasing interest in their use as in-context learning agents. At the core of agentic behavior is the capacity for exploration, or the ability to actively gather information about the environment. But how do LLM agents explore, and how can we control their exploratory behaviors? To answer these questions, we take a representation-level perspective, and introduce Entropic Activation Steering (EAST), an activation steering method for in-context LLM agents. Firstly, we demonstrate that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM, in contrast to token-level temperature sampling. Secondly, we reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation
