Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective

Tianyi Zhang; David Traum

arXiv:2603.14217·cs.CL·March 24, 2026

Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective

Tianyi Zhang, David Traum

PDF

Open Access

TL;DR

This paper critiques current evaluation metrics for retrieval-augmented personalized dialogue, highlighting their shortcomings and proposing a cognitively grounded approach aligned with human conversational principles.

Contribution

It re-examines the LAPDOG framework, identifying evaluation limitations and advocating for assessment methods rooted in cognitive and linguistic theories of dialogue.

Findings

01

Human and LLM judgments align closely

02

Lexical similarity metrics often diverge from human judgments

03

Current metrics fail to capture coherence and shared understanding

Abstract

In cognitive science and linguistic theory, dialogue is not seen as a chain of independent utterances but rather as a joint activity sustained by coherence, consistency, and shared understanding. However, many systems for open-domain and personalized dialogue use surface-level similarity metrics (e.g., BLEU, ROUGE, F1) as one of their main reporting measures, which fail to capture these deeper aspects of conversational quality. We re-examine a notable retrieval-augmented framework for personalized dialogue, LAPDOG, as a case study for evaluation methodology. Using both human and LLM-based judges, we identify limitations in current evaluation practices, including corrupted dialogue histories, contradictions between retrieved stories and persona, and incoherent response generation. Our results show that human and LLM judgments align closely but diverge from lexical similarity metrics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning