Diverse Transformer Decoding for Offline Reinforcement Learning Using   Financial Algorithmic Approaches

Dan Elbaz; Oren Salzman

arXiv:2502.10473·cs.AI·February 18, 2025

Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches

Dan Elbaz, Oren Salzman

PDF

Open Access

TL;DR

This paper introduces Portfolio Beam Search, a novel decoding method for offline reinforcement learning with Transformers, inspired by financial algorithms, which enhances exploration and reduces variability in decision sequences.

Contribution

It proposes Portfolio Beam Search, an uncertainty-aware diversification technique that improves exploration and stability in offline RL decoding with Transformers.

Findings

01

Higher returns on D4RL locomotion benchmark

02

Significant reduction in outcome variability

03

Enhanced exploration during decoding

Abstract

Offline Reinforcement Learning (RL) algorithms learn a policy using a fixed training dataset, which is then deployed online to interact with the environment and make decisions. Transformers, a standard choice for modeling time-series data, are gaining popularity in offline RL. In this context, Beam Search (BS), an approximate inference algorithm, is the go-to decoding method. Offline RL eliminates the need for costly or risky online data collection. However, the restricted dataset induces uncertainty as the agent may encounter unfamiliar sequences of states and actions during execution that were not covered in the training data. In this context, BS lacks two important properties essential for offline RL: It does not account for the aforementioned uncertainty, and its greedy left-right search approach often results in sequences with minimal variations, failing to explore potentially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvolutionary Algorithms and Applications · Energy Load and Power Forecasting · Neural Networks and Applications

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax