DecisionLLM: Large Language Models for Long Sequence Decision Exploration

Xiaowei Lv; Zhilin Zhang; Yijun Li; Yusen Huo; Siyuan Ju; Xuyan Li; Chunxiang Hong; Tianyu Wang; Yongcai Wang; Peng Sun; Chuan Yu; Jian Xu; Bo Zheng

arXiv:2601.10148·cs.AI·January 16, 2026

DecisionLLM: Large Language Models for Long Sequence Decision Exploration

Xiaowei Lv, Zhilin Zhang, Yijun Li, Yusen Huo, Siyuan Ju, Xuyan Li, Chunxiang Hong, Tianyu Wang, Yongcai Wang, Peng Sun, Chuan Yu, Jian Xu, Bo Zheng

PDF

Open Access

TL;DR

This paper explores using large language models for long-horizon decision-making tasks, addressing challenges in interpreting continuous data and demonstrating strong performance improvements over existing methods.

Contribution

It introduces DecisionLLM, a novel framework that aligns trajectory data with natural language to enable LLMs to perform decision exploration, establishing scaling laws for performance.

Findings

01

DecisionLLM-3B outperforms Decision Transformer by 69.4 in Maze2D.

02

DecisionLLM achieves better results in offline benchmarks and bidding scenarios.

03

The framework highlights the importance of model scale, data volume, and data quality.

Abstract

Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as real-time bidding in computational advertising. The Decision Transformer (DT) introduced a powerful paradigm by framing RL as an autoregressive sequence modeling problem. Concurrently, Large Language Models (LLMs) have demonstrated remarkable success in complex reasoning and planning tasks. This inspires us whether LLMs, which share the same Transformer foundation, but operate at a much larger scale, can unlock new levels of performance in long-horizon sequential decision-making problem. This work investigates the application of LLMs to offline decision making tasks. A fundamental challenge in this domain is the LLMs' inherent inability to interpret continuous values, as they lack a native…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research