Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning

Th\'eo Vincent; Yogesh Tripathi; Tim Faust; Abdullah Akg\"ul; Yaniv Oren; Melih Kandemir; Jan Peters; Carlo D'Eramo

arXiv:2506.04398·cs.LG·March 2, 2026

Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning

Th\'eo Vincent, Yogesh Tripathi, Tim Faust, Abdullah Akg\"ul, Yaniv Oren, Melih Kandemir, Jan Peters, Carlo D'Eramo

PDF

TL;DR

This paper introduces iterated Shared Q-Learning (iS-QL), a method that combines target-based and target-free reinforcement learning techniques to improve sample efficiency and bridge the performance gap with minimal memory overhead.

Contribution

The paper proposes a novel approach that shares parameters between online and target networks, enhancing resource efficiency and performance in reinforcement learning.

Findings

01

iS-QL improves sample efficiency over target-free methods.

02

The approach bridges the performance gap with target-based algorithms.

03

It maintains low memory usage while leveraging target-based benefits.

Abstract

The use of target networks in deep reinforcement learning is a widely popular solution to mitigate the brittleness of semi-gradient approaches and stabilize learning. However, target networks notoriously require additional memory and delay the propagation of Bellman updates compared to an ideal target-free approach. In this work, we step out of the binary choice between target-free and target-based algorithms. We introduce a new method that uses a copy of the last linear layer of the online network as a target network, while sharing the remaining parameters with the up-to-date online network. This simple modification enables us to keep the target-free's low-memory footprint while leveraging the target-based literature. We find that combining our approach with the concept of iterated $Q$ -learning, which consists of learning consecutive Bellman updates in parallel, helps improve the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Q-Learning