Differentiable Evolutionary Reinforcement Learning

Sitao Cheng; Tianle Li; Xuhan Huang; Xunjian Yin; Difan Zou

arXiv:2512.13399·cs.AI·May 14, 2026

Differentiable Evolutionary Reinforcement Learning

Sitao Cheng, Tianle Li, Xuhan Huang, Xunjian Yin, Difan Zou

PDF

1 Repo 9 Models

TL;DR

DERL introduces a differentiable, bi-level framework for autonomous reward structure discovery in reinforcement learning, enabling better generalization and causal understanding across complex reasoning tasks.

Contribution

It proposes a novel differentiable meta-optimization approach for evolving reward functions, improving over black-box methods in RL.

Findings

01

DERL achieves state-of-the-art results on agent benchmarks.

02

It substantially outperforms non-differentiable baselines.

03

DERL captures intrinsic causal structures of tasks.

Abstract

Crafting effective reward signals remains a central challenge in Reinforcement Learning (RL), especially for complex reasoning tasks. Existing automated reward optimization methods typically rely on derivative-free search heuristics that treat the reward function as a black box, failing to exploit the causal dynamics between reward structure modifications and policy performance. We introduce Differentiable Evolutionary Reinforcement Learning (DERL), a bi-level framework for the autonomous discovery of optimal reward structures. DERL employs a Meta-Optimizer that evolves a reward function through the composition of structured atomic primitives to guide an inner-loop policy. Unlike prior black-box methods, DERL introduces differentiability into the meta-optimization process by updating the Meta-Optimizer using policy gradients derived from inner-loop validation performance. This allows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sitaocheng/DERL
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.