# Advantage Amplification in Slowly Evolving Latent-State Environments

**Authors:** Martin Mladenov, Ofer Meshi, Jayden Ooi, Dale Schuurmans, Craig, Boutilier

arXiv: 1905.13559 · 2019-06-03

## TL;DR

This paper introduces advantage amplification techniques using temporal abstraction to improve reinforcement learning in long-horizon, latent-state environments like recommender systems, addressing key challenges such as belief state error.

## Contribution

It proposes a general principle of advantage amplification, develops aggregation methods, and provides theoretical and empirical validation for slowly evolving latent-state environments.

## Key findings

- Advantage amplification can mitigate belief state errors.
- Aggregation methods induce advantage amplification under certain conditions.
- Empirical results show improved performance in user-modeling tasks.

## Abstract

Latent-state environments with long horizons, such as those faced by recommender systems, pose significant challenges for reinforcement learning (RL). In this work, we identify and analyze several key hurdles for RL in such environments, including belief state error and small action advantage. We develop a general principle of advantage amplification that can overcome these hurdles through the use of temporal abstraction. We propose several aggregation methods and prove they induce amplification in certain settings. We also bound the loss in optimality incurred by our methods in environments where latent state evolves slowly and demonstrate their performance empirically in a stylized user-modeling task.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13559/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13559/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/1905.13559/full.md

---
Source: https://tomesphere.com/paper/1905.13559