Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning

Minmin Zhang; Sina Aghaei; Soroush Saghafian

arXiv:2605.09009·cs.LG·May 12, 2026

Large Language Models for Sequential Decision-Making: Improving In-Context Learning via Supervised Fine-Tuning

Minmin Zhang, Sina Aghaei, Soroush Saghafian

PDF

TL;DR

This paper demonstrates that supervised fine-tuning of large language models significantly enhances their ability to perform sequential decision-making tasks from offline data, outperforming in-context learning alone.

Contribution

The paper introduces a framework for fine-tuning LLMs for sequential decision-making, providing theoretical insights and empirical evidence of improved performance in complex environments.

Findings

01

Fine-tuned LLMs achieve smaller optimality gaps than in-context-only baselines.

02

Supervised fine-tuning improves decision-making in longer-horizon, partially observed, and ambiguous environments.

03

Theoretical analysis links attention mechanisms to optimal Q-function estimation.

Abstract

Large language models (LLMs) have shown remarkable in-context learning (ICL) capabilities, yet their potential for sequential decision-making remains underexplored. In this paper, we study the ICL capabilities of LLMs in sequential decision-making settings, including Markov Decision Processes (MDPs), Partially Observable MDPs (POMDPs), and Ambiguous POMDPs (APOMDPs). We fine-tune pretrained LLMs to perform few-shot decision-making directly from offline, oracle-labeled trajectories. Our framework enables flexible imitation of policies through supervised fine-tuning (SFT). Theoretically, we focus on linear MDPs and interpret a fine-tuned attention layer as implicitly estimating optimal Q-functions from in-context data. Building on this interpretation, we derive an end-to-end suboptimality bound for the induced policy that separates the in-context estimation error from the training-length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.