DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice

Leying Zhang; Tingxiao Zhou; Haiyang Sun; Mengxiao Bi; Yanmin Qian

arXiv:2601.15596·cs.SD·January 23, 2026

DeepASMR: LLM-Based Zero-Shot ASMR Speech Generation for Anyone of Any Voice

Leying Zhang, Tingxiao Zhou, Haiyang Sun, Mengxiao Bi, Yanmin Qian

PDF

Open Access

TL;DR

DeepASMR is a novel zero-shot ASMR speech synthesis framework that uses minimal speaker data and combines LLM-based content encoding with flow-based acoustic decoding to generate high-fidelity ASMR in any voice.

Contribution

It introduces the first zero-shot ASMR generation method, leveraging a two-stage pipeline and a new multi-speaker ASMR corpus for high-quality, style-specific speech synthesis.

Findings

01

Achieves state-of-the-art naturalness and style fidelity in ASMR synthesis.

02

Requires only a short speech snippet from the target speaker.

03

Maintains competitive performance on normal speech synthesis.

Abstract

While modern Text-to-Speech (TTS) systems achieve high fidelity for read-style speech, they struggle to generate Autonomous Sensory Meridian Response (ASMR), a specialized, low-intensity speech style essential for relaxation. The inherent challenges include ASMR's subtle, often unvoiced characteristics and the demand for zero-shot speaker adaptation. In this paper, we introduce DeepASMR, the first framework designed for zero-shot ASMR generation. We demonstrate that a single short snippet of a speaker's ordinary, read-style speech is sufficient to synthesize high-fidelity ASMR in their voice, eliminating the need for whispered training data from the target speaker. Methodologically, we first identify that discrete speech tokens provide a soft factorization of ASMR style from speaker timbre. Leveraging this insight, we propose a two-stage pipeline incorporating a Large Language Model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research