Topic-based Evaluation for Conversational Bots
Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu, Venkatesh, Ashwin Ram

TL;DR
This paper introduces topic-based metrics for evaluating conversational bots, focusing on coherence, engagement, and topic diversity, using a novel deep learning approach to classify conversation topics.
Contribution
It presents a new topic classification method with a topic-word attention mechanism and demonstrates that topic-based metrics align with human judgments in bot evaluation.
Findings
Metrics correlate with human ratings
Topic diversity improves user engagement
Proposed method outperforms baseline classifiers
Abstract
Dialog evaluation is a challenging problem, especially for non task-oriented dialogs where conversational success is not well-defined. We propose to evaluate dialog quality using topic-based metrics that describe the ability of a conversational bot to sustain coherent and engaging conversations on a topic, and the diversity of topics that a bot can handle. To detect conversation topics per utterance, we adopt Deep Average Networks (DAN) and train a topic classifier on a variety of question and query data categorized into multiple topics. We propose a novel extension to DAN by adding a topic-word attention table that allows the system to jointly capture topic keywords in an utterance and perform topic classification. We compare our proposed topic based metrics with the ratings provided by users and show that our metrics both correlate with and complement human judgment. Our analysis is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining
