EchoVoices: Preserving Generational Voices and Memories for Seniors and Children

Haiying Xu; Haoze Liu; Mingshi Li; Siyu Cai; Guangxuan Zheng; Yuhuang Jia; Jinghua Zhao; Yong Qin

arXiv:2507.15221·cs.SD·July 22, 2025

EchoVoices: Preserving Generational Voices and Memories for Seniors and Children

Haiying Xu, Haoze Liu, Mingshi Li, Siyu Cai, Guangxuan Zheng, Yuhuang Jia, Jinghua Zhao, Yong Qin

PDF

TL;DR

EchoVoices is a comprehensive system that preserves the voices and memories of seniors and children through advanced speech recognition, synthesis, and memory integration, enabling intergenerational connection and digital legacy creation.

Contribution

The paper introduces a novel end-to-end pipeline combining enhanced speech recognition, adaptive speech synthesis, and memory-driven conversational agents tailored for seniors and children.

Findings

01

Improved speech recognition accuracy on senior and child datasets

02

High-fidelity, speaker-aware speech synthesis results

03

Effective memory system for consistent, personalized interactions

Abstract

Recent breakthroughs in intelligent speech and digital human technologies have primarily targeted mainstream adult users, often overlooking the distinct vocal patterns and interaction styles of seniors and children. These demographics possess distinct vocal characteristics, linguistic styles, and interaction patterns that challenge conventional ASR, TTS, and LLM systems. To address this, we introduce EchoVoices, an end-to-end digital human pipeline dedicated to creating persistent digital personas for seniors and children, ensuring their voices and memories are preserved for future generations. Our system integrates three core innovations: a k-NN-enhanced Whisper model for robust speech recognition of atypical speech; an age-adaptive VITS model for high-fidelity, speaker-aware speech synthesis; and an LLM-driven agent that automatically generates persona cards and leverages a RAG-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.