TL;DR
DERL introduces a differentiable, bi-level framework for autonomous reward structure discovery in reinforcement learning, enabling better generalization and causal understanding across complex reasoning tasks.
Contribution
It proposes a novel differentiable meta-optimization approach for evolving reward functions, improving over black-box methods in RL.
Findings
DERL achieves state-of-the-art results on agent benchmarks.
It substantially outperforms non-differentiable baselines.
DERL captures intrinsic causal structures of tasks.
Abstract
Crafting effective reward signals remains a central challenge in Reinforcement Learning (RL), especially for complex reasoning tasks. Existing automated reward optimization methods typically rely on derivative-free search heuristics that treat the reward function as a black box, failing to exploit the causal dynamics between reward structure modifications and policy performance. We introduce Differentiable Evolutionary Reinforcement Learning (DERL), a bi-level framework for the autonomous discovery of optimal reward structures. DERL employs a Meta-Optimizer that evolves a reward function through the composition of structured atomic primitives to guide an inner-loop policy. Unlike prior black-box methods, DERL introduces differentiability into the meta-optimization process by updating the Meta-Optimizer using policy gradients derived from inner-loop validation performance. This allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DifferentiableEvolutionaryRL/DERL-Meta-Optimizer-Init-Qwen2.5-0.5B-Instructmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗DifferentiableEvolutionaryRL/DERL-GSM8k-Math-Qwen2.5-3Bmodel· 1 dl1 dl
- 🤗DifferentiableEvolutionaryRL/DERL-MATH-Qwen-2.5-3Bmodel
- 🤗DifferentiableEvolutionaryRL/DERL-ALFWorld-L0-Qwen2.5-1.5Bmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗DifferentiableEvolutionaryRL/DERL-ALFWorld-L1-Qwen2.5-1.5Bmodel· 3 dl3 dl
- 🤗DifferentiableEvolutionaryRL/DERL-ALFWorld-L2-Qwen2.5-1.5Bmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗DifferentiableEvolutionaryRL/DERL-ScienceWorld-L0-Qwen2.5-1.5Bmodel· 2 dl2 dl
- 🤗DifferentiableEvolutionaryRL/DERL-ScienceWorld-L1-Qwen2.5-1.5Bmodel
- 🤗DifferentiableEvolutionaryRL/DERL-ScienceWorld-L2-Qwen2.5-1.5Bmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
