Diminishing Return of Value Expansion Methods

Daniel Palenicek; Michael Lutter; Jo\~ao Carvalho; Daniel Dennert,; Faran Ahmad; and Jan Peters

arXiv:2412.20537·cs.LG·December 31, 2024

Diminishing Return of Value Expansion Methods

Daniel Palenicek, Michael Lutter, Jo\~ao Carvalho, Daniel Dennert,, Faran Ahmad, and Jan Peters

PDF

Open Access 1 Repo

TL;DR

This paper empirically shows that increasing rollout horizons in model-based value expansion improves sample efficiency only marginally, and higher model accuracy offers limited gains, indicating other factors limit performance.

Contribution

The study reveals that longer horizons and improved model accuracy yield diminishing returns in sample efficiency, challenging the belief that model accuracy is the main bottleneck.

Findings

01

Longer rollout horizons provide limited additional sample efficiency.

02

Enhanced model accuracy only marginally improves performance.

03

Model-free methods achieve similar efficiency without complex models.

Abstract

Model-based reinforcement learning aims to increase sample efficiency, but the accuracy of dynamics models and the resulting compounding errors are often seen as key limitations. This paper empirically investigates potential sample efficiency gains from improved dynamics models in model-based value expansion methods. Our study reveals two key findings when using oracle dynamics models to eliminate compounding errors. First, longer rollout horizons enhance sample efficiency, but the improvements quickly diminish with each additional expansion step. Second, increased model accuracy only marginally improves sample efficiency compared to learned models with identical horizons. These diminishing returns in sample efficiency are particularly noteworthy when compared to model-free value expansion methods. These model-free algorithms achieve comparable performance without the computational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielpalen/value_expansion
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Reporting and Valuation Research