Hallucination Level of Artificial Intelligence Whisperer: Case Speech Recognizing Pantterinousut Rap Song
Ismo Horppu, Frederick Ayala, Erlin Gulbenkoglu

TL;DR
This paper evaluates the hallucination levels of AI speech-to-text systems by transcribing Finnish rap lyrics, comparing Faster Whisperer and YouTube's internal tool, highlighting challenges in understanding complex and artistic language.
Contribution
It introduces a novel case study assessing hallucination and mishearing in AI speech recognition on Finnish rap, a language and context with high complexity.
Findings
Faster Whisperer and YouTube's speech-to-text show different error patterns.
Transcribing Finnish rap lyrics presents significant challenges for AI models.
The study provides insights into hallucination levels in complex language scenarios.
Abstract
All languages are peculiar. Some of them are considered more challenging to understand than others. The Finnish Language is known to be a complex language. Also, when languages are used by artists, the pronunciation and meaning might be more tricky to understand. Therefore, we are putting AI to a fun, yet challenging trial: translating a Finnish rap song to text. We will compare the Faster Whisperer algorithm and YouTube's internal speech-to-text functionality. The reference truth will be Finnish rap lyrics, which the main author's little brother, Mc Timo, has written. Transcribing the lyrics will be challenging because the artist raps over synth music player by Syntikka Janne. The hallucination level and mishearing of AI speech-to-text extractions will be measured by comparing errors made against the original Finnish lyrics. The error function is informal but still works for our case.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Authorship Attribution and Profiling · Mental Health via Writing
