GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models
Michal Chudoba, Rudolf Rosa

TL;DR
This paper presents a novel approach for generating high-quality Czech poetry using fine-tuned language models, emphasizing the importance of tokenization and explicit parameter guidance to improve poetic structure and rhyme accuracy.
Contribution
Introduces a new Czech poetry generation model with explicit parameter control and optimized tokenization, advancing automated poetic creation in less-resourced languages.
Findings
Explicit parameter guidance improves poetic structure and rhyme.
Syllable or character-based tokenization outperforms subword methods.
The approach achieves high accuracy in rhyme and metric quality.
Abstract
High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLiterature, Language, and Rhetoric Studies · Linguistics and language evolution · Language and Culture
MethodsVERtex Similarity Embeddings
