GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

Michal Chudoba; Rudolf Rosa

arXiv:2407.12790·cs.CL·July 19, 2024

GPT Czech Poet: Generation of Czech Poetic Strophes with Language Models

Michal Chudoba, Rudolf Rosa

PDF

Open Access 1 Models

TL;DR

This paper presents a novel approach for generating high-quality Czech poetry using fine-tuned language models, emphasizing the importance of tokenization and explicit parameter guidance to improve poetic structure and rhyme accuracy.

Contribution

Introduces a new Czech poetry generation model with explicit parameter control and optimized tokenization, advancing automated poetic creation in less-resourced languages.

Findings

01

Explicit parameter guidance improves poetic structure and rhyme.

02

Syllable or character-based tokenization outperforms subword methods.

03

The approach achieves high accuracy in rhyme and metric quality.

Abstract

High-quality automated poetry generation systems are currently only available for a small subset of languages. We introduce a new model for generating poetry in Czech language, based on fine-tuning a pre-trained Large Language Model. We demonstrate that guiding the generation process by explicitly specifying strophe parameters within the poem text strongly improves the effectiveness of the model. We also find that appropriate tokenization is crucial, showing that tokenization methods based on syllables or individual characters instead of subwords prove superior in generating poetic strophes. We further enhance the results by introducing \textit{Forced~generation}, adding explicit specifications of meter and verse parameters at inference time based on the already generated text. We evaluate a range of setups, showing that our proposed approach achieves high accuracies in rhyming and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
jinymusim/gpt-czech-poet
model· 29 dl· ♡ 3
29 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLiterature, Language, and Rhetoric Studies · Linguistics and language evolution · Language and Culture

MethodsVERtex Similarity Embeddings