Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu, Siqi Bao, Huang He, Fan Wang, Hua Wu, Haifeng Wang

TL;DR
This paper introduces Diamante, a novel approach that uses explicit and implicit human feedback to improve open-domain chatbots, resulting in more engaging and human-aligned responses in Chinese dialogue models.
Contribution
The paper presents a new feedback collection method and joint training paradigm that significantly enhance Chinese open-domain dialogue models' performance.
Findings
Diamante dataset improves response engagement.
Joint training aligns responses with human preferences.
Enhanced Chinese dialogue model performance.
Abstract
Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
