A Subgame Perfect Equilibrium Reinforcement Learning Approach to   Time-inconsistent Problems

Nixie S. Lesmana; Chi Seng Pun

arXiv:2110.14295·cs.LG·October 28, 2021

A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

Nixie S. Lesmana, Chi Seng Pun

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning framework called SPERL for solving time-inconsistent problems by establishing subgame perfect equilibrium, addressing key challenges with a new backward policy iteration algorithm and demonstrating its effectiveness in portfolio optimization.

Contribution

The paper develops the SPERL framework and the BPI algorithm, extending dynamic programming to handle time-inconsistency in RL, which was previously unaddressed.

Findings

01

BPI effectively solves TIC problems in RL.

02

The framework demonstrates convergence in portfolio selection.

03

Model identifiability is improved with the new approach.

Abstract

In this paper, we establish a subgame perfect equilibrium reinforcement learning (SPERL) framework for time-inconsistent (TIC) problems. In the context of RL, TIC problems are known to face two main challenges: the non-existence of natural recursive relationships between value functions at different time points and the violation of Bellman's principle of optimality that raises questions on the applicability of standard policy iteration algorithms for unprovable policy improvement theorems. We adapt an extended dynamic programming theory and propose a new class of algorithms, called backward policy iteration (BPI), that solves SPERL and addresses both challenges. To demonstrate the practical usage of BPI as a training framework, we adapt standard RL simulation methods and derive two BPI-based training algorithms. We examine our derived training frameworks on a mean-variance portfolio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics