Active Measure Reinforcement Learning for Observation Cost Minimization

Colin Bellinger; Rory Coles; Mark Crowley; Isaac Tamblyn

arXiv:2005.12697·cs.AI·May 27, 2020

Active Measure Reinforcement Learning for Observation Cost Minimization

Colin Bellinger, Rory Coles, Mark Crowley, Isaac Tamblyn

PDF

TL;DR

This paper introduces Active Measure Reinforcement Learning (Amrl), enabling agents to balance observation costs and rewards by learning to minimize observation costs while maximizing returns.

Contribution

The paper presents the Amrl framework, allowing RL agents to learn policies that optimize the trade-off between observation costs and rewards during training.

Findings

01

Amrl-Q agents learn policies and state estimators in parallel.

02

Agents shift from costly measurements to estimators during training.

03

Amrl-Q achieves higher costed return than standard methods.

Abstract

Standard reinforcement learning (RL) algorithms assume that the observation of the next state comes instantaneously and at no cost. In a wide variety of sequential decision making tasks ranging from medical treatment to scientific discovery, however, multiple classes of state observations are possible, each of which has an associated cost. We propose the active measure RL framework (Amrl) as an initial solution to this problem where the agent learns to maximize the costed return, which we define as the discounted sum of rewards minus the sum of observation costs. Our empirical evaluation demonstrates that Amrl-Q agents are able to learn a policy and state estimator in parallel during online training. During training the agent naturally shifts from its reliance on costly measurements of the environment to its state estimator in order to increase its reward. It does this without harm to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsQ-Learning