Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

Heeseung Kim; Che Hyun Lee; Sangkwon Park; Jiheum Yeom; Nohil Park; Sangwon Yu; Sungroh Yoon

arXiv:2502.19759·cs.SD·May 26, 2025

Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models

Heeseung Kim, Che Hyun Lee, Sangkwon Park, Jiheum Yeom, Nohil Park, Sangwon Yu, Sungroh Yoon

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates how well open-source voice interaction models recall and utilize past conversations, revealing significant limitations especially in speech-based models and suggesting directions for improvement.

Contribution

It introduces ContextDialog, a benchmark for assessing conversational memory in open-source models, and provides a systematic analysis of their recall capabilities.

Findings

01

Speech-based models struggle more than text-based models in recalling past utterances.

02

Retrieval-augmented generation does not fully solve memory recall issues.

03

Open-source models have notable limitations in conversational context retention.

Abstract

Recent advancements in multi-turn voice interaction models have improved user-model communication. However, while closed-source models effectively retain and recall past utterances, whether open-source models share this ability remains unexplored. To fill this gap, we systematically evaluate how well open-source interaction models utilize past utterances using ContextDialog, a benchmark we proposed for this purpose. Our findings show that speech-based models have more difficulty than text-based ones, especially when recalling information conveyed in speech, and even with retrieval-augmented generation, models still struggle with questions about past utterances. These insights highlight key limitations in open-source models and suggest ways to improve memory retention and retrieval robustness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ContextDialog/ContextDialog
dataset· 34 dl
34 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · AI in Service Interactions · Speech and dialogue systems