MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue   Generation

Junqing He; Liang Zhu; Rui Wang; Xi Wang; Reza Haffari; Jiaxing Zhang

arXiv:2409.15240·cs.CL·October 24, 2024

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

Junqing He, Liang Zhu, Rui Wang, Xi Wang, Reza Haffari, Jiaxing Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces MADial-Bench, a comprehensive benchmark for evaluating memory-augmented dialogue systems, emphasizing diverse memory recall and human-like response qualities beyond traditional metrics.

Contribution

It creates a novel benchmark based on cognitive science, incorporating new evaluation criteria for memory recall, emotion support, and intimacy in dialogue systems.

Findings

01

Embedding models show potential for improvement.

02

Memory injection correlates with emotion support.

03

Large language models perform well on the benchmark.

Abstract

Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation· underline

Taxonomy

TopicsSpeech and dialogue systems · Context-Aware Activity Recognition Systems · Intelligent Tutoring Systems and Adaptive Learning

MethodsFocus