Soft $Q(\lambda)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

Pranav Mahajan; Ben Seymour

arXiv:2604.13780·cs.LG·April 16, 2026

Soft $Q(\lambda)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces

Pranav Mahajan, Ben Seymour

PDF

TL;DR

This paper introduces Soft Q(λ), an off-policy, eligibility trace-based method for entropy-regularised reinforcement learning, extending soft Q-learning with multi-step and off-policy capabilities.

Contribution

It formalizes an n-step soft Q-learning framework, introduces a novel Soft Tree Backup operator, and unifies these into Soft Q(λ) for efficient off-policy learning.

Findings

01

Proposes a formal n-step soft Q-learning formulation.

02

Introduces a Soft Tree Backup operator for off-policy learning.

03

Unifies these into the Soft Q(λ) framework for efficient credit assignment.

Abstract

Soft Q-learning has emerged as a versatile model-free method for entropy-regularised reinforcement learning, optimising for returns augmented with a penalty on the divergence from a reference policy. Despite its success, the multi-step extensions of soft Q-learning remain relatively unexplored and limited to on-policy action sampling under the Boltzmann policy. In this brief research note, we first present a formal $n$ -step formulation for soft Q-learning and then extend this framework to the fully off-policy case by introducing a novel Soft Tree Backup operator. Finally, we unify these developments into Soft $Q (λ)$ , an elegant online, off-policy, eligibility trace framework that allows for efficient credit assignment under arbitrary behaviour policies. Our derivations propose a model-free method for learning entropy-regularised value functions that can be utilised in future…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.