Training Data Size Sensitivity in Unsupervised Rhyme Recognition
Petr Plech\'a\v{c}, Artjoms \v{S}e\c{l}a, Silvie Cinkov\'a, Mirella De Sisto, Lara Nugues, Ne\v{z}a Ko\v{c}nik, Antonina Martynenko, Ben Nagy, Luca Giovannini, Robert Kol\'ar

TL;DR
This paper explores how training data size impacts the accuracy of unsupervised rhyme recognition across multiple languages, using RhymeTagger and comparing it to large language models.
Contribution
It provides a comprehensive analysis of training data requirements for reliable rhyme recognition and benchmarks RhymeTagger against human agreement and LLMs.
Findings
RhymeTagger outperforms human agreement with sufficient data
LLMs struggle without phonetic representations
Training size significantly influences rhyme recognition accuracy
Abstract
Rhyme is deceptively intuitive: what is or is not a rhyme is constructed historically, scholars struggle with rhyme classification, and people disagree on whether two words are rhymed or not. This complicates automated rhymed recognition and evaluation, especially in multilingual context. This article investigates how much training data is needed for reliable unsupervised rhyme recognition using RhymeTagger, a language-independent tool that identifies rhymes based on repeating patterns in poetry corpora. We evaluate its performance across seven languages (Czech, German, English, French, Italian, Russian, and Slovene), examining how training size and language differences affect accuracy. To set a realistic performance benchmark, we assess inter-annotator agreement on a manually annotated subset of poems and analyze factors contributing to disagreement in expert annotations: phonetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
