Prompt Tuning Decision Transformers with Structured and Scalable Bandits

Finn Rietz; Oleg Smirnov; Sara Karimi; Lele Cao

arXiv:2502.04979·cs.LG·October 2, 2025

Prompt Tuning Decision Transformers with Structured and Scalable Bandits

Finn Rietz, Oleg Smirnov, Sara Karimi, Lele Cao

PDF

Open Access 1 Video

TL;DR

This paper introduces a bandit-based prompt tuning method for Decision Transformers in offline RL, improving task generalization and scalability by learning optimal prompts at inference time with theoretical guarantees.

Contribution

It proposes a structured bandit architecture for prompt construction, leveraging pre-trained PDT features, and provides theoretical regret bounds with empirical performance improvements.

Findings

01

Achieves linear scaling with prompt size

02

Enhances performance across diverse tasks and environments

03

Outperforms existing prompt tuning baselines

Abstract

Prompt tuning has emerged as a key technique for adapting large pre-trained Decision Transformers (DTs) in offline Reinforcement Learning (RL), particularly in multi-task and few-shot settings. The Prompting Decision Transformer (PDT) enables task generalization via trajectory prompts sampled uniformly from expert demonstrations -- without accounting for prompt informativeness. In this work, we propose a bandit-based prompt-tuning method that learns to construct optimal trajectory prompts from demonstration data at inference time. We devise a structured bandit architecture operating in the trajectory prompt space, achieving linear rather than combinatorial scaling with prompt size. Additionally, we show that the pre-trained PDT itself can serve as a powerful feature extractor for the bandit, enabling efficient reward modeling across various environments. We theoretically establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Prompt Tuning Decision Transformers with Structured and Scalable Bandits· slideslive

Taxonomy

TopicsData Stream Mining Techniques · Advanced Bandit Algorithms Research

MethodsAttention Is All You Need · Label Smoothing · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam