Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement   Learning

Dan Elbaz; Gal Novik; Oren Salzman

arXiv:2211.04583·cs.LG·December 7, 2022

Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning

Dan Elbaz, Gal Novik, Oren Salzman

PDF

Open Access

TL;DR

This paper introduces a risk-aware planning algorithm for offline reinforcement learning that integrates modern portfolio theory, improving decision stability and balancing reward maximization with risk mitigation.

Contribution

The paper proposes a novel risk-aware planning method for offline RL that incorporates MPT, enhancing stability and performance over existing Transformer-based approaches.

Findings

01

Achieves state-of-the-art performance on offline RL tasks.

02

Reduces variance and increases stability of decision-making.

03

Effectively balances reward and risk in offline RL environments.

Abstract

Offline reinforcement-learning (RL) algorithms learn to make decisions using a given, fixed training dataset without online data collection. This problem setting is captivating because it holds the promise of utilizing previously collected datasets without any costly or risky interaction with the environment. However, this promise also bears the drawback of this setting as the restricted dataset induces uncertainty because the agent can encounter unfamiliar sequences of states and actions that the training data did not cover. To mitigate the destructive uncertainty effects, we need to balance the aspiration to take reward-maximizing actions with the incurred risk due to incorrect ones. In financial economics, modern portfolio theory (MPT) is a method that risk-averse investors can use to construct diversified portfolios that maximize their returns without unacceptable levels of risk. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Stock Market Forecasting Methods

MethodsAttention Is All You Need · Label Smoothing · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Linear Layer · Multi-Head Attention · Adam · Absolute Position Encodings · Layer Normalization