VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications

Loan Do; Thanh Ngoc Nguyen; Thanh Pham; Vinh Do; Hien Nguyen; Charlotte Nguyen

arXiv:2603.01894·cs.SD·March 3, 2026

VietSuperSpeech: A Large-Scale Vietnamese Conversational Speech Dataset for ASR Fine-Tuning in Chatbot, Customer Support, and Call Center Applications

Loan Do, Thanh Ngoc Nguyen, Thanh Pham, Vinh Do, Hien Nguyen, Charlotte Nguyen

PDF

Open Access

TL;DR

VietSuperSpeech is a large-scale, publicly available Vietnamese speech dataset focusing on casual conversational speech, designed to improve ASR models for real-world chatbot and customer support applications.

Contribution

It introduces a novel Vietnamese conversational speech dataset sourced from YouTube, filling a gap in resources for informal speech recognition in Vietnamese.

Findings

01

Enhances Vietnamese ASR performance in conversational contexts

02

Provides a valuable resource for training and benchmarking ASR models

03

Fills a critical gap in Vietnamese speech datasets for informal speech

Abstract

We introduce VietSuperSpeech, a large-scale Vietnamese automatic speech recognition (ASR) dataset of 52,023 audio-text pairs totaling 267.39 hours, with a distinctive focus on casual conversational speech. Unlike existing Vietnamese ASR corpora that predominantly feature read speech, news narration, or audiobook content, VietSuperSpeech is sourced from four publicly accessible YouTube channels spanning everyday conversation, personal vlogging, overseas Vietnamese community dialogue, and informal commentary - the very speech styles encountered in real-world chatbot, customer support, call center, and hotline deployments. All audio is standardized to 16 kHz mono PCM WAV and segmented into 3-30 second utterances. Transcriptions are generated via pseudo-labeling using the Zipformer-30M-RNNT-6000h model (Nguyen, 2025) deployed through Sherpa-ONNX, pre-trained on 6,000 hours of Vietnamese…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · AI in Service Interactions · Speech and dialogue systems