From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with   LLM-Guided Knowledge

Xiefeng Wu

arXiv:2410.01458·cs.AI·October 3, 2024

From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge

Xiefeng Wu

PDF

Open Access

TL;DR

This paper introduces Q-shaping, a method that uses large language models to initialize Q-values, significantly improving sample efficiency and outperforming reward shaping methods across various environments.

Contribution

The paper proposes Q-shaping as a novel, unbiased approach to incorporate domain knowledge via LLM-guided Q-value initialization, enhancing reinforcement learning performance.

Findings

01

Q-shaping improves sample efficiency by 16.87% over baselines.

02

Q-shaping achieves 253.80% better performance than LLM-based reward shaping.

03

Q-shaping is general, robust, and guarantees optimality.

Abstract

Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law