Parameterized Projected Bellman Operator
Th\'eo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters,, Marcello Restelli, Carlo D'Eramo

TL;DR
This paper introduces the projected Bellman operator (PBO), a novel approach that learns an approximate Bellman operator to improve reinforcement learning efficiency by avoiding sampling issues and computationally heavy projections.
Contribution
The paper proposes PBO, a new operator that generalizes Bellman updates, reduces computational costs, and is theoretically analyzed and empirically validated in RL settings.
Findings
PBO outperforms traditional Bellman operator in several RL tasks.
Theoretical analysis confirms PBO's convergence properties.
Empirical results demonstrate improved learning efficiency.
Abstract
Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
