Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL

Finn Rietz; Oleg Smirnov; Sara Karimi; Lele Cao

arXiv:2502.06358·cs.LG·July 21, 2025

Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL

Finn Rietz, Oleg Smirnov, Sara Karimi, Lele Cao

PDF

Open Access

TL;DR

This paper introduces a bandit-based prompt-tuning framework for offline multi-task reinforcement learning, improving task generalization and sample efficiency without fine-tuning large models.

Contribution

It proposes an inference-time bandit approach to optimize trajectory prompts, addressing limitations of uniform prompt sampling in multi-task offline RL.

Findings

01

Enhanced task performance with bandit prompt-tuning

02

Improved sample efficiency and scalability

03

Better exploration of prompt space compared to baselines

Abstract

Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL) pre-training by leveraging stochastic trajectory prompts to identify the target task. However, these prompts are sampled uniformly from expert demonstrations, overlooking a critical limitation: not all prompts are equally informative for differentiating between tasks. This limits generalization and adaptation, especially in low-data or open-world settings where sample efficiency is crucial. To address this issue, we propose a lightweight, inference-time, bandit-based prompt-tuning framework. The bandit explores and optimizes trajectory prompt selection to enhance task performance, while avoiding costly fine-tuning of the transformer backbone. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding