Mi\'{c}i Princ -- A Little Boy Teaching Speech Technologies the Chakavian Dialect

Nikola Ljube\v{s}i\'c; Peter Rupnik; Tea Perin\v{c}i\'c

arXiv:2602.03245·eess.AS·February 4, 2026

Mi\'{c}i Princ -- A Little Boy Teaching Speech Technologies the Chakavian Dialect

Nikola Ljube\v{s}i\'c, Peter Rupnik, Tea Perin\v{c}i\'c

PDF

Open Access

TL;DR

This paper presents a digitized Chakavian dialect version of The Little Prince, aligned at word level, to preserve cultural content and improve speech recognition models for dialectal speech processing.

Contribution

It introduces a new aligned audio-text dataset of The Little Prince in Chakavian dialect and demonstrates its use in adapting speech recognition models.

Findings

01

Word error rate halved with model adaptation

02

Character error reduced by up to two thirds

03

Dataset enables diverse AI and dialectal research

Abstract

This paper documents our efforts in releasing the printed and audio book of the translation of the famous novel The Little Prince into the Chakavian dialect, as a computer-readable, AI-ready dataset, with the textual and the audio components of the two releases now aligned on the level of each written and spoken word. Our motivation for working on this release is multiple. The first one is our wish to preserve the highly valuable and specific content beyond the small editions of the printed and the audio book. With the dataset published in the CLARIN.SI repository, this content is from now on at the fingertips of any interested individual. The second motivation is to make the data available for various artificial-intelligence-related usage scenarios, such as the one we follow upon inside this paper already -- adapting the Whisper-large-v3 open automatic speech recognition model, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDiverse Musicological Studies · Language and cultural evolution · Forensic and Genetic Research