An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

Yangyang Zhao; Ben Niu; Libo Qin; Shihan Wang

arXiv:2506.03519·cs.CL·June 6, 2025

An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

Yangyang Zhao, Ben Niu, Libo Qin, Shihan Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel method combining evolutionary algorithms with deep reinforcement learning, enhanced by elite individual injection, to improve dialogue policy optimization efficiency and performance in task-oriented systems.

Contribution

It proposes an innovative integration of EA and DRL with an elite injection mechanism to accelerate convergence and enhance exploration in dialogue policy learning.

Findings

01

Significant performance improvements across four datasets.

02

Reduced exploration time due to elite individual injection.

03

Effective balance between exploration and exploitation achieved.

Abstract

Deep Reinforcement Learning (DRL) is widely used in task-oriented dialogue systems to optimize dialogue policy, but it struggles to balance exploration and exploitation due to the high dimensionality of state and action spaces. This challenge often results in local optima or poor convergence. Evolutionary Algorithms (EAs) have been proven to effectively explore the solution space of neural networks by maintaining population diversity. Inspired by this, we innovatively combine the global search capabilities of EA with the local optimization of DRL to achieve a balance between exploration and exploitation. Nevertheless, the inherent flexibility of natural language in dialogue tasks complicates this direct integration, leading to prolonged evolutionary times. Thus, we further propose an elite individual injection mechanism to enhance EA's search efficiency by adaptively introducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals.· underline

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications