Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Warit Sirichotedumrong; Adisai Na-Thalang; Potsawee Manakul; Pittawat Taveekitworachai; Sittipong Sripaisarnmongkol; Kunat Pipatanakul

arXiv:2601.13044·cs.CL·January 21, 2026

Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition

Warit Sirichotedumrong, Adisai Na-Thalang, Potsawee Manakul, Pittawat Taveekitworachai, Sittipong Sripaisarnmongkol, Kunat Pipatanakul

PDF

Open Access 4 Models 2 Datasets

TL;DR

This paper introduces Typhoon ASR Real-time, a low-latency FastConformer-Transducer model for Thai speech recognition, achieving high accuracy with significantly reduced computational cost and supporting dialect adaptation and standardized evaluation.

Contribution

The paper presents a compact FastConformer-Transducer model for Thai ASR, a normalization pipeline for consistent training, a dialect adaptation method, and a new benchmark dataset for reproducibility.

Findings

01

45x reduction in computational cost compared to Whisper Large-v3

02

Achieves comparable accuracy to large offline models

03

Introduces a standardized Thai ASR benchmark dataset

Abstract

Large encoder-decoder models like Whisper achieve strong offline transcription but remain impractical for streaming applications due to high latency. However, due to the accessibility of pre-trained checkpoints, the open Thai ASR landscape remains dominated by these offline architectures, leaving a critical gap in efficient streaming solutions. We present Typhoon ASR Real-time, a 115M-parameter FastConformer-Transducer model for low-latency Thai speech recognition. We demonstrate that rigorous text normalization can match the impact of model scaling: our compact model achieves a 45x reduction in computational cost compared to Whisper Large-v3 while delivering comparable accuracy. Our normalization pipeline resolves systemic ambiguities in Thai transcription --including context-dependent number verbalization and repetition markers (mai yamok) --creating consistent training targets. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques