# Reinforcement Learning in Non-Stationary Environments

**Authors:** Sindhu Padakandla, Prabuchandran K. J, and Shalabh Bhatnagar

arXiv: 1905.03970 · 2020-06-08

## TL;DR

This paper develops reinforcement learning algorithms capable of adapting to non-stationary environments by detecting changes and maximizing long-term rewards, demonstrated through applications in traffic control, energy management, and random MDPs.

## Contribution

It introduces a change point detection integrated with RL to handle non-stationarity, a significant advancement over traditional stationary environment assumptions.

## Key findings

- Effective change point detection in non-stationary environments.
- Improved long-term reward maximization in dynamic settings.
- Successful application to traffic, energy, and random MDP problems.

## Abstract

Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, one often encounters situations with non-stationary environments and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward achieved when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem and a traffic signal control problem.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.03970/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.03970/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/1905.03970/full.md

---
Source: https://tomesphere.com/paper/1905.03970