Trust Region Value Optimization using Kalman Filtering

Shirli Di-Castro Shashua; Shie Mannor

arXiv:1901.07860·cs.LG·January 24, 2019·5 cites

Trust Region Value Optimization using Kalman Filtering

Shirli Di-Castro Shashua, Shie Mannor

PDF

Open Access

TL;DR

This paper introduces KOVA, a novel Bayesian optimization method for policy evaluation in reinforcement learning that accounts for uncertainty in value function parameters using Kalman filtering, improving estimation confidence.

Contribution

It proposes KOVA, a new trust-region optimization technique based on the Extended Kalman Filter, incorporating distributional properties for enhanced value estimation in RL.

Findings

01

KOVA effectively estimates value functions with uncertainty quantification.

02

The method performs well in large state and action space domains.

03

Theoretical analysis supports the robustness of the approach.

Abstract

Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms