Parameterized Projected Bellman Operator

Th\'eo Vincent; Alberto Maria Metelli; Boris Belousov; Jan Peters,; Marcello Restelli; Carlo D'Eramo

arXiv:2312.12869·cs.LG·March 7, 2024·1 cites

Parameterized Projected Bellman Operator

Th\'eo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters,, Marcello Restelli, Carlo D'Eramo

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the projected Bellman operator (PBO), a novel approach that learns an approximate Bellman operator to improve reinforcement learning efficiency by avoiding sampling issues and computationally heavy projections.

Contribution

The paper proposes PBO, a new operator that generalizes Bellman updates, reduces computational costs, and is theoretically analyzed and empirically validated in RL settings.

Findings

01

PBO outperforms traditional Bellman operator in several RL tasks.

02

Theoretical analysis confirms PBO's convergence properties.

03

Empirical results demonstrate improved learning efficiency.

Abstract

Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theovincent/pbo
jaxOfficial

Videos

Parameterized Projected Bellman Operator· underline

Taxonomy

TopicsNeural Networks and Applications