Self-Optimizing and Pareto-Optimal Policies in General Environments   based on Bayes-Mixtures

Marcus Hutter

arXiv:cs/0204040·cs.AI·May 23, 2007·20 cites

Self-Optimizing and Pareto-Optimal Policies in General Environments based on Bayes-Mixtures

Marcus Hutter

PDF

Open Access

TL;DR

This paper demonstrates that Bayesian mixture-based policies in unknown environments are both self-optimizing, converging to optimal rewards, and Pareto-optimal, outperforming other policies across all considered environments.

Contribution

It establishes that Bayes-optimal policies derived from mixture distributions are both self-optimizing and Pareto-optimal in general probabilistic environments without structural assumptions.

Findings

01

Bayes-optimal policies converge to the best possible reward in unknown environments.

02

Self-optimizing policies exist if the environment class admits them.

03

Bayes-optimal policies are Pareto-optimal across all environments in the class.

Abstract

The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle $t$ action $y_{t}$ results in perception $x_{t}$ and reward $r_{t}$ , where all quantities in general may depend on the complete history. The perception $x_{t}$ and reward $r_{t}$ are sampled from the (reactive) environmental probability distribution $μ$ . This very general setting includes, but is not limited to, (partial observable, k-th order) Markov decision processes. Sequential decision theory tells us how to act in order to maximize the total expected reward, called value, if $μ$ is known. Reinforcement learning is usually used if $μ$ is unknown. In the Bayesian approach one defines a mixture distribution $ξ$ as a weighted sum of distributions $ν \in \M$ , where $\M$ is any class of distributions including the true environment $μ$ . We show that the Bayes-optimal policy $p^{ξ}$ based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms