Dialogue Response Ranking Training with Large-Scale Human Feedback Data
Xiang Gao, Yizhe Zhang, Michel Galley, Chris Brockett, Bill Dolan

TL;DR
This paper introduces DialogRPT, a large-scale feedback prediction model trained on 133 million human feedback pairs, which improves response ranking in open-domain dialogue systems by better aligning with human preferences.
Contribution
The paper presents a novel large-scale training dataset and a GPT-2 based ranking model that outperforms baselines in predicting engaging responses and correlates well with human preferences.
Findings
DialogRPT outperforms baselines in feedback prediction.
The model correlates better with human preferences.
Combining feedback and scoring models improves response ranking.
Abstract
Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/DialogRPT-depthmodel· 23 dl· ♡ 523 dl♡ 5
- 🤗microsoft/DialogRPT-human-vs-machinemodel· 44 dl· ♡ 544 dl♡ 5
- 🤗microsoft/DialogRPT-human-vs-randmodel· 184 dl· ♡ 10184 dl♡ 10
- 🤗microsoft/DialogRPT-updownmodel· 488 dl· ♡ 11488 dl♡ 11
- 🤗microsoft/DialogRPT-widthmodel· 23 dl· ♡ 123 dl♡ 1
- 🤗titanicc/titanicdrptmodel· 3 dl3 dl
- 🤗DyNin/carbotmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
MethodsLinear Layer · Cosine Annealing · Layer Normalization · Weight Decay · Dropout · Dense Connections · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Multi-Head Attention
