Anti-Overestimation Dialogue Policy Learning for Task-Completion   Dialogue System

Chang Tian; Wenpeng Yin; Marie-Francine Moens

arXiv:2207.11762·cs.CL·April 16, 2024

Anti-Overestimation Dialogue Policy Learning for Task-Completion Dialogue System

Chang Tian, Wenpeng Yin, Marie-Francine Moens

PDF

Open Access

TL;DR

This paper introduces a dynamic partial average estimator to reduce overestimation in reinforcement learning for dialogue systems, improving stability and performance across multiple datasets.

Contribution

It proposes a novel DPAV method that adaptively mitigates overestimation bias in RL-based dialogue policy learning, with theoretical convergence guarantees.

Findings

01

Achieves better or comparable results to top baselines

02

Lower computational load compared to existing methods

03

Provides theoretical proof of convergence and bias bounds

Abstract

A dialogue policy module is an essential part of task-completion dialogue systems. Recently, increasing interest has focused on reinforcement learning (RL)-based dialogue policy. Its favorable performance and wise action decisions rely on an accurate estimation of action values. The overestimation problem is a widely known issue of RL since its estimate of the maximum action value is larger than the ground truth, which results in an unstable learning process and suboptimal policy. This problem is detrimental to RL-based dialogue policy learning. To mitigate this problem, this paper proposes a dynamic partial average estimator (DPAV) of the ground truth maximum action value. DPAV calculates the partial average between the predicted maximum action value and minimum action value, where the weights are dynamically adaptive and problem-dependent. We incorporate DPAV into a deep Q-network as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling