SLIDE: A Framework Integrating Small and Large Language Models for   Open-Domain Dialogues Evaluation

Kun Zhao; Bohao Yang; Chen Tang; Chenghua Lin; Liang Zhan

arXiv:2405.15924·cs.CL·May 31, 2024·1 cites

SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation

Kun Zhao, Bohao Yang, Chen Tang, Chenghua Lin, Liang Zhan

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

SLIDE is a new framework that combines small specialized models and large language models to improve the automatic evaluation of open-domain dialogues, addressing the one-to-many response problem and domain-specific challenges.

Contribution

The paper introduces SLIDE, a novel framework integrating small and large models with contrastive learning and semantic sensitivity metrics for superior dialogue evaluation.

Findings

01

Achieves state-of-the-art performance in dialogue evaluation tasks.

02

Better correlation with human judgments compared to existing metrics.

03

Effectively addresses the one-to-many response problem.

Abstract

The long-standing one-to-many problem of gold standard responses in open-domain dialogue systems presents challenges for automatic evaluation metrics. Though prior works have demonstrated some success by applying powerful Large Language Models (LLMs), existing approaches still struggle with the one-to-many problem, and exhibit subpar performance in domain-specific scenarios. We assume the commonsense reasoning biases within LLMs may hinder their performance in domainspecific evaluations. To address both issues, we propose a novel framework SLIDE (Small and Large Integrated for Dialogue Evaluation), that leverages both a small, specialised model (SLM), and LLMs for the evaluation of open domain dialogues. Our approach introduces several techniques: (1) Contrastive learning to differentiate between robust and non-robust response embeddings; (2) A novel metric for semantic sensitivity that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hegehongcha/slide-acl2024
noneOfficial

Datasets

yangbh217/SLIDE-ACL2024-Dataset
dataset· 34 dl
34 dl

Videos

SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation· underline

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling

MethodsContrastive Learning