Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets

M\'at\'e Gedeon; Piroska Zs\'ofia Barta; P\'eter Mihajlik; Tekla Etelka Gr\'aczi; Anna Koh\'ari; Katalin M\'ady

arXiv:2511.13529·cs.CL·January 15, 2026

Toward Conversational Hungarian Speech Recognition: Introducing the BEA-Large and BEA-Dialogue Datasets

M\'at\'e Gedeon, Piroska Zs\'ofia Barta, P\'eter Mihajlik, Tekla Etelka Gr\'aczi, Anna Koh\'ari, Katalin M\'ady

PDF

Open Access

TL;DR

This paper introduces two new Hungarian speech datasets, BEA-Large and BEA-Dialogue, to advance research in spontaneous and conversational speech recognition for underrepresented languages, along with baseline results.

Contribution

The paper presents the creation of the BEA-Large and BEA-Dialogue datasets, providing valuable resources and baseline benchmarks for Hungarian conversational speech recognition.

Findings

01

Fast Conformer achieved 14.18% WER on spontaneous speech.

02

Diarization error rates ranged from 12.46% to 17.40%.

03

Datasets facilitate research in conversational ASR and speaker diarization.

Abstract

The advancement of automatic speech recognition (ASR) has been largely enhanced by extensive datasets in high-resource languages, while languages such as Hungarian remain underrepresented due to limited spontaneous and conversational corpora. To address this gap, we introduce two new datasets -- BEA-Large and BEA-Dialogue -- constructed from the previously unprocessed portions of the Hungarian speech corpus named BEA. BEA-Large extends BEA-Base with 255 hours of spontaneous speech from 433 speakers, enriched with detailed segment-level metadata. BEA-Dialogue, comprising 85 hours of spontaneous conversations, is a Hungarian speech corpus featuring natural dialogues partitioned into speaker-independent subsets, supporting research in conversational ASR and speaker diarization. We establish reproducible baselines on these datasets using publicly available ASR models, with the fine-tuned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques