Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes   as a Minimal Sequence of Cumulative Tasks

Reinaldo Uribe Muriel; Fernando Lozando; Charles Anderson

arXiv:1504.05122·cs.LG·April 21, 2015

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

Reinaldo Uribe Muriel, Fernando Lozando, Charles Anderson

PDF

Open Access

TL;DR

This paper introduces optimal nudging, a new method for solving average-reward semi-Markov decision processes by transforming them into a minimal sequence of cumulative reward problems, improving efficiency and stability.

Contribution

The paper presents a novel optimal nudging approach that reduces average-reward SMDPs to a small number of cumulative reward tasks, with a new gain update rule based on geometric analysis.

Findings

01

The method requires solving fewer cumulative reward tasks than traditional approaches.

02

Optimal nudging demonstrates competitive performance in experiments.

03

The gain update rule minimizes uncertainty effectively.

Abstract

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain to some fixed value, which transitorily makes the problem a cumulative-reward task, solving it by any standard reinforcement learning method, and only then updating the gain in a way that minimizes uncertainty in a minmax sense. The rule for optimal gain update is derived by exploiting the geometric features of the w-l space, a simple mapping of the space of policies. The total number of cumulative reward tasks that need to be solved is shown to be small. Some experiments are presented to explore…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Supply Chain and Inventory Management · Advanced Bandit Algorithms Research