The configurable tree graph (CT-graph): measurable problems in partially   observable and distal reward environments for lifelong reinforcement learning

Andrea Soltoggio; Eseoghene Ben-Iwhiwhu; Christos Peridis; Pawel; Ladosz; Jeffery Dick; Praveen K. Pilly; Soheil Kolouri

arXiv:2302.10887·cs.LG·February 23, 2023

The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning

Andrea Soltoggio, Eseoghene Ben-Iwhiwhu, Christos Peridis, Pawel, Ladosz, Jeffery Dick, Praveen K. Pilly, Soheil Kolouri

PDF

Open Access 1 Repo

TL;DR

This paper presents the CT-graph, a flexible, mathematically defined environment for testing reinforcement learning algorithms under complex, partially observable, and hierarchical reward scenarios, facilitating systematic comparisons in lifelong learning.

Contribution

It introduces a configurable, transparent environment with hierarchical structure and variable complexity for benchmarking reinforcement learning algorithms in lifelong learning contexts.

Findings

01

Environment supports variable observability and reward structures.

02

Enables controlled complexity growth for testing scalability.

03

Facilitates comparison of RL algorithms in dynamic, multi-task settings.

Abstract

This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

soltoggio/ct-graph
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · Advanced Software Engineering Methodologies