Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches
Dan Elbaz, Oren Salzman

TL;DR
This paper introduces Portfolio Beam Search, a novel decoding method for offline reinforcement learning with Transformers, inspired by financial algorithms, which enhances exploration and reduces variability in decision sequences.
Contribution
It proposes Portfolio Beam Search, an uncertainty-aware diversification technique that improves exploration and stability in offline RL decoding with Transformers.
Findings
Higher returns on D4RL locomotion benchmark
Significant reduction in outcome variability
Enhanced exploration during decoding
Abstract
Offline Reinforcement Learning (RL) algorithms learn a policy using a fixed training dataset, which is then deployed online to interact with the environment and make decisions. Transformers, a standard choice for modeling time-series data, are gaining popularity in offline RL. In this context, Beam Search (BS), an approximate inference algorithm, is the go-to decoding method. Offline RL eliminates the need for costly or risky online data collection. However, the restricted dataset induces uncertainty as the agent may encounter unfamiliar sequences of states and actions during execution that were not covered in the training data. In this context, BS lacks two important properties essential for offline RL: It does not account for the aforementioned uncertainty, and its greedy left-right search approach often results in sequences with minimal variations, failing to explore potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Energy Load and Power Forecasting · Neural Networks and Applications
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
