Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz; Nadav Merlis; Lior Shani; Martin Mladenov; Craig; Boutilier

arXiv:2302.02061·cs.LG·May 19, 2023

Reinforcement Learning with History-Dependent Dynamic Contexts

Guy Tennenholtz, Nadav Merlis, Lior Shani, Martin Mladenov, Craig, Boutilier

PDF

Open Access

TL;DR

This paper introduces Dynamic Contextual Markov Decision Processes (DCMDPs), a new reinforcement learning framework for environments where context changes over time, with algorithms and regret bounds for practical applications like recommendation systems.

Contribution

It generalizes the contextual MDP framework to non-Markov environments, proposing algorithms with theoretical guarantees and demonstrating effectiveness on real-world data.

Findings

01

Derived regret bounds for the proposed algorithms

02

Developed a practical planning algorithm in latent space

03

Validated approach on MovieLens recommendation data

Abstract

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Data Stream Mining Techniques · Context-Aware Activity Recognition Systems