DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation

Kun Zhao; Bohao Yang; Chen Tang; Siyuan Dai; Haoteng Tang; Chenghua Lin; Liang Zhan

arXiv:2506.04516·cs.CL·June 6, 2025

DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation

Kun Zhao, Bohao Yang, Chen Tang, Siyuan Dai, Haoteng Tang, Chenghua Lin, Liang Zhan

PDF

Open Access

TL;DR

This paper introduces DRE, a dual-refined method that combines small and large language models through adaptive weighting to improve open-domain dialogue evaluation accuracy, outperforming existing approaches.

Contribution

The paper proposes a novel Dual-Refinement Evaluation (DRE) method that effectively integrates small and large language models for more reliable dialogue assessment.

Findings

01

DRE outperforms existing evaluation methods.

02

DRE shows stronger alignment with human judgments.

03

Combining SLMs and LLMs enhances evaluation robustness.

Abstract

Large Language Models (LLMs) excel at many tasks but struggle with ambiguous scenarios where multiple valid responses exist, often yielding unreliable results. Conversely, Small Language Models (SLMs) demonstrate robustness in such scenarios but are susceptible to misleading or adversarial inputs. We observed that LLMs handle negative examples effectively, while SLMs excel with positive examples. To leverage their complementary strengths, we introduce SLIDE (Small and Large Integrated for Dialogue Evaluation), a method integrating SLMs and LLMs via adaptive weighting. Building on SLIDE, we further propose a Dual-Refinement Evaluation (DRE) method to enhance SLM-LLM integration: (1) SLM-generated insights guide the LLM to produce initial evaluations; (2) SLM-derived adjustments refine the LLM's scores for improved accuracy. Experiments demonstrate that DRE outperforms existing methods,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Advanced Graph Neural Networks