Fitted Q-Learning for Relational Domains

Srijita Das; Sriraam Natarajan; Kaushik Roy; Ronald Parr; Kristian; Kersting

arXiv:2006.05595·cs.LG·June 11, 2020·6 cites

Fitted Q-Learning for Relational Domains

Srijita Das, Sriraam Natarajan, Kaushik Roy, Ronald Parr, Kristian, Kersting

PDF

Open Access

TL;DR

This paper introduces relational fitted Q-learning algorithms for approximate dynamic programming in relational domains, utilizing gradient boosting to efficiently learn value functions without domain models.

Contribution

It is the first to develop relational fitted Q-learning algorithms, applying gradient boosting to perform Bellman updates in relational settings.

Findings

01

Performs well on standard domains

02

Requires fewer training trajectories

03

Does not rely on domain models

Abstract

We consider the problem of Approximate Dynamic Programming in relational domains. Inspired by the success of fitted Q-learning methods in propositional settings, we develop the first relational fitted Q-learning algorithms by representing the value function and Bellman residuals. When we fit the Q-functions, we show how the two steps of Bellman operator; application and projection steps can be performed using a gradient-boosting technique. Our proposed framework performs reasonably well on standard domains without using domain models and using fewer training trajectories.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · AI-based Problem Solving and Planning

MethodsQ-Learning