Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with   Q-Value Predictions

Tongxin Li; Yiheng Lin; Shaolei Ren; Adam Wierman

arXiv:2307.10524·cs.LG·October 31, 2023

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman

PDF

Open Access 1 Video

TL;DR

This paper introduces learning-augmented algorithms for MDPs that leverage Q-value advice with known generation processes, achieving a novel balance between consistency and robustness and improving performance over black-box advice methods.

Contribution

It presents the first tradeoff analysis for Q-value advice in MDPs with known advice generation, enabling dynamic selection between advice and robust baselines.

Findings

01

Achieves near-optimal performance guarantees.

02

Demonstrates improved tradeoff over black-box advice.

03

Applicable to both continuous and discrete MDPs.

Abstract

We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference