Why are Sequence-to-Sequence Models So Dull? Understanding the Low-Diversity Problem of Chatbots
Shaojie Jiang, Maarten de Rijke

TL;DR
This paper investigates why sequence-to-sequence models produce low-diversity, dull responses in chatbots, identifying over-confidence as a key factor and proposing potential solutions like confidence penalties and label smoothing.
Contribution
It highlights model over-confidence as a novel source of low diversity in Seq2Seq dialogue models and discusses strategies to mitigate this issue.
Findings
Over-confidence contributes to low response diversity.
Existing approaches partially address the low-diversity problem.
Proposed methods include confidence penalties and label smoothing.
Abstract
Diversity is a long-studied topic in information retrieval that usually refers to the requirement that retrieved results should be non-repetitive and cover different aspects. In a conversational setting, an additional dimension of diversity matters: an engaging response generation system should be able to output responses that are diverse and interesting. Sequence-to-sequence (Seq2Seq) models have been shown to be very effective for response generation. However, dialogue responses generated by Seq2Seq models tend to have low diversity. In this paper, we review known sources and existing approaches to this low-diversity problem. We also identify a source of low diversity that has been little studied so far, namely model over-confidence. We sketch several directions for tackling model over-confidence and, hence, the low-diversity problem, including confidence penalties and label smoothing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
