SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, Liang Zhan

TL;DR
SLIDE is a new framework that combines small specialized models and large language models to improve the automatic evaluation of open-domain dialogues, addressing the one-to-many response problem and domain-specific challenges.
Contribution
The paper introduces SLIDE, a novel framework integrating small and large models with contrastive learning and semantic sensitivity metrics for superior dialogue evaluation.
Findings
Achieves state-of-the-art performance in dialogue evaluation tasks.
Better correlation with human judgments compared to existing metrics.
Effectively addresses the one-to-many response problem.
Abstract
The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domainspecific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialised model (SLM), and LLMs for the evaluation of open domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling
MethodsContrastive Learning
