Trust Region Value Optimization using Kalman Filtering
Shirli Di-Castro Shashua, Shie Mannor

TL;DR
This paper introduces KOVA, a novel Bayesian optimization method for policy evaluation in reinforcement learning that accounts for uncertainty in value function parameters using Kalman filtering, improving estimation confidence.
Contribution
It proposes KOVA, a new trust-region optimization technique based on the Extended Kalman Filter, incorporating distributional properties for enhanced value estimation in RL.
Findings
KOVA effectively estimates value functions with uncertainty quantification.
The method performs well in large state and action space domains.
Theoretical analysis supports the robustness of the approach.
Abstract
Policy evaluation is a key process in reinforcement learning. It assesses a given policy using estimation of the corresponding value function. When using a parameterized function to approximate the value, it is common to optimize the set of parameters by minimizing the sum of squared Bellman Temporal Differences errors. However, this approach ignores certain distributional properties of both the errors and value parameters. Taking these distributions into account in the optimization process can provide useful information on the amount of confidence in value estimation. In this work we propose to optimize the value by minimizing a regularized objective function which forms a trust region over its parameters. We present a novel optimization method, the Kalman Optimization for Value Approximation (KOVA), based on the Extended Kalman Filter. KOVA minimizes the regularized objective function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms
