Markov flow policy -- deep MC

Nitsan Soffair; Gilad Katz

arXiv:2405.00877·cs.LG·September 2, 2024

Markov flow policy -- deep MC

Nitsan Soffair, Gilad Katz

PDF

Open Access

TL;DR

The paper introduces the Markov Flow Policy, a neural network flow-based method that improves evaluation accuracy and performance in short-term tasks by addressing biases and limitations of traditional discounted algorithms.

Contribution

It proposes a novel neural network flow approach for forward-view predictions, enhancing short-term task performance and mitigating train-test bias in reinforcement learning.

Findings

01

Significant performance improvements on MuJoCo benchmarks

02

Effective reduction of train-test bias in evaluation

03

Easy to implement and integrate into existing frameworks

Abstract

Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(\gamma\)). Interestingly, these algorithms are often tested without applying a discount, a phenomenon we refer as the \textit{train-test bias}. In response to these challenges, we propose the Markov Flow Policy, which utilizes a non-negative neural network flow to enable comprehensive forward-view predictions. Through integration into the TD7 codebase and evaluation using the MuJoCo benchmark, we observe significant performance improvements, positioning MFP as a straightforward, practical, and easily implementable solution within the domain of average rewards algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics