PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs
Zhan Qu, Shuzhou Yuan, Michael F\"arber

TL;DR
This paper systematically evaluates large language models' ability to generate classical Chinese Songci poetry with strict structural, tonal, and rhyme constraints, introducing a new evaluation framework and a Generate-Critic architecture.
Contribution
It develops a comprehensive evaluation framework for constrained poetry generation and proposes a Generate-Critic architecture to improve LLM performance through automated feedback.
Findings
Evaluation framework effectively measures LLM performance
Fine-tuning with the critic improves conformity scores
Analysis reveals strengths and limitations of LLMs in cultural text generation
Abstract
This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing Songci, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across 4 families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-based, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Games · Topic Modeling · Digital Humanities and Scholarship
