Open Source State-Of-the-Art Solution for Romanian Speech Recognition

Gabriel Pirlogeanu; Alexandru-Lucian Georgescu; Horia Cucu

arXiv:2511.03361·eess.AS·November 6, 2025

Open Source State-Of-the-Art Solution for Romanian Speech Recognition

Gabriel Pirlogeanu, Alexandru-Lucian Georgescu, Horia Cucu

PDF

Open Access 2 Models

TL;DR

This paper introduces a new Romanian speech recognition system using NVIDIA's FastConformer architecture, trained on extensive data, achieving state-of-the-art accuracy and efficiency across various speech benchmarks.

Contribution

The work is the first to apply FastConformer to Romanian ASR, achieving significant WER reduction and demonstrating practical decoding efficiency.

Findings

01

Achieved up to 27% relative WER reduction.

02

Performed well across read, spontaneous, and domain-specific speech.

03

Demonstrated practical decoding efficiency for low-latency applications.

Abstract

In this work, we present a new state-of-the-art Romanian Automatic Speech Recognition (ASR) system based on NVIDIA's FastConformer architecture--explored here for the first time in the context of Romanian. We train our model on a large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600 hours of speech. Leveraging a hybrid decoder with both Connectionist Temporal Classification (CTC) and Token-Duration Transducer (TDT) branches, we evaluate a range of decoding strategies including greedy, ALSD, and CTC beam search with a 6-gram token-level language model. Our system achieves state-of-the-art performance across all Romanian evaluation benchmarks, including read, spontaneous, and domain-specific speech, with up to 27% relative WER reduction compared to previous best-performing systems. In addition to improved transcription accuracy, our approach demonstrates practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research