# Geometry of Policy Improvement

**Authors:** Guido Montufar, Johannes Rauh

arXiv: 1704.01785 · 2017-04-07

## TL;DR

This paper explores the geometric structure of optimal decision policies with limited information, revealing that optimal policies randomize among a small subset of actions based on the number of consistent states, and introduces a geometric policy improvement framework.

## Contribution

It introduces a geometric approach to policy improvement, characterizing optimal policies as randomized among at most k actions based on the number of consistent states, and relates long-term reward to discounted reward.

## Key findings

- Optimal policies randomize among at most k actions for k consistent states.
- Expected long-term reward can be analyzed via discounted reward.
- A geometric policy improvement lemma identifies policy change cones.

## Abstract

We investigate the geometry of optimal memoryless time independent decision making in relation to the amount of information that the acting agent has about the state of the system. We show that the expected long term reward, discounted or per time step, is maximized by policies that randomize among at most $k$ actions whenever at most $k$ world states are consistent with the agent's observation. Moreover, we show that the expected reward per time step can be studied in terms of the expected discounted reward. Our main tool is a geometric version of the policy improvement lemma, which identifies a polyhedral cone of policy changes in which the state value function increases for all states.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.01785/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1704.01785/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/1704.01785/full.md

---
Source: https://tomesphere.com/paper/1704.01785