Why are Sequence-to-Sequence Models So Dull? Understanding the   Low-Diversity Problem of Chatbots

Shaojie Jiang; Maarten de Rijke

arXiv:1809.01941·cs.CL·September 7, 2018·5 cites

Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots

Shaojie Jiang, Maarten de Rijke

PDF

Open Access

TL;DR

This paper investigates why sequence-to-sequence models produce low-diversity, dull responses in chatbots, identifying over-confidence as a key factor and proposing potential solutions like confidence penalties and label smoothing.

Contribution

It highlights model over-confidence as a novel source of low diversity in Seq2Seq dialogue models and discusses strategies to mitigate this issue.

Findings

01

Over-confidence contributes to low response diversity.

02

Existing approaches partially address the low-diversity problem.

03

Proposed methods include confidence penalties and label smoothing.

Abstract

Diversity is a long-studied topic in information retrieval that usually refers to the requirement that retrieved results should be non-repetitive and cover different aspects. In a conversational setting, an additional dimension of diversity matters: an engaging response generation system should be able to output responses that are diverse and interesting. Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation. However, dialogue responses generated by Seq2Seq models tend to have low diversity. In this paper, we review known sources and existing approaches to this low-diversity problem. We also identify a source of low diversity that has been little studied so far, namely model over-confidence. We sketch several directions for tackling model over-confidence and, hence, the low-diversity problem, including confidence penalties and label smoothing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence