Value Function Initialization for Knowledge Transfer and Jump-start in Deep Reinforcement Learning

Soumia Mehimeh

arXiv:2508.09277·cs.AI·August 14, 2025

Value Function Initialization for Knowledge Transfer and Jump-start in Deep Reinforcement Learning

Soumia Mehimeh

PDF

TL;DR

This paper introduces DQInit, a novel method for value function initialization in deep reinforcement learning that leverages prior task knowledge to enhance early learning efficiency and stability.

Contribution

It proposes a new approach to transfer value estimates in DRL using compact tabular Q-values and a knownness-based mechanism, addressing challenges of continuous spaces and neural network noise.

Findings

01

DQInit improves early learning speed in continuous control tasks.

02

It enhances stability and overall performance compared to standard initialization.

03

The method effectively transfers knowledge without policy or demonstration reliance.

Abstract

Value function initialization (VFI) is an effective way to achieve a jumpstart in reinforcement learning (RL) by leveraging value estimates from prior tasks. While this approach is well established in tabular settings, extending it to deep reinforcement learning (DRL) poses challenges due to the continuous nature of the state-action space, the noisy approximations of neural networks, and the impracticality of storing all past models for reuse. In this work, we address these challenges and introduce DQInit, a method that adapts value function initialization to DRL. DQInit reuses compact tabular Q-values extracted from previously solved tasks as a transferable knowledge base. It employs a knownness-based mechanism to softly integrate these transferred values into underexplored regions and gradually shift toward the agent's learned estimates, avoiding the limitations of fixed time decay.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.