Structured Style-Rewrite with Chain-of-Thought Planning for Low-Resource Character Dialogue
Chanhui Zhu

TL;DR
This paper introduces a structured style-rewrite framework with chain-of-thought supervision for low-resource Chinese character dialogue generation, improving style control and semantic fidelity in small language models.
Contribution
It proposes a novel decomposed style representation combined with chain-of-thought planning and preference optimization, enhancing style accuracy and semantic consistency.
Findings
Achieved a Valid Style Score of 0.632 on eight characters.
Maintained semantic fidelity of 0.878.
Outperformed larger baseline models on consumer hardware.
Abstract
Applying Small Language Models (SLMs) to Chinese character-driven generation remains challenging due to data scarcity and the difficulty of disentangling character style. Standard Supervised Fine-Tuning (SFT) often captures surface-level semantics but produces frequent Out-Of-Character (OOC) outputs. We frame this as a controlled sentence-level style rewriting task, which isolates stylistic quality from dialogue context management. We propose a Structured Style-Rewrite Framework that decomposes character style into interpretable format signature, syntactic, and pragmatic dimensions, combined with Chain-of-Thought (CoT) supervision for explicit style planning. A CoT-Shared Direct Preference Optimization (DPO) stage further aligns style planning with surface realization by ensuring preference learning targets output-level style execution rather than reasoning trace differences.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · Machine Learning in Healthcare · Topic Modeling
