Can large language models explore in-context?
Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang,, Aleksandrs Slivkins

TL;DR
This paper evaluates whether large language models can naturally perform exploration in decision-making tasks without training, finding that they generally require external interventions like summarization to exhibit robust exploratory behavior.
Contribution
The study systematically assesses native exploration capabilities of LLMs in bandit environments, highlighting the necessity of external summarization and interventions for effective exploration.
Findings
GPT-4 with chain-of-thought and summarization shows some exploratory behavior.
Most configurations do not exhibit robust exploration without interventions.
External summarization is crucial for enabling exploration in LLM agents.
Abstract
We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Label Smoothing · Transformer · Attention Dropout · Cosine Annealing · Multi-Head Attention
