From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN

Peiwen Yuan; Chuyi Tan; Shaoxiong Feng; Yiwei Li; Xinglin Wang; Yueqi Zhang; Jiayi Shi; Boyuan Pan; Yao Hu; Kan Li

arXiv:2502.13544·cs.CL·June 10, 2025

From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MARKERGEN

Peiwen Yuan, Chuyi Tan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang, Jiayi Shi, Boyuan Pan, Yao Hu, Kan Li

PDF

Open Access 1 Video

TL;DR

This paper introduces MarkerGen, a novel approach that improves length-controlled text generation in large language models by decomposing sub-abilities, explicitly modeling length, and employing a three-stage generation scheme, leading to better adherence and quality.

Contribution

It presents a bottom-up decomposition of length control sub-abilities and a plug-and-play MarkerGen method that enhances LCTG performance and generalizability.

Findings

01

Significant improvement in length adherence across various settings.

02

Effective external tool integration enhances LLM capabilities.

03

Three-stage generation scheme maintains content quality while controlling length.

Abstract

Despite the rapid progress of large language models (LLMs), their length-controllable text generation (LCTG) ability remains below expectations, posing a major limitation for practical applications. Existing methods mainly focus on end-to-end training to reinforce adherence to length constraints. However, the lack of decomposition and targeted enhancement of LCTG sub-abilities restricts further progress. To bridge this gap, we conduct a bottom-up decomposition of LCTG sub-abilities with human patterns as reference and perform a detailed error analysis. On this basis, we propose MarkerGen, a simple-yet-effective plug-and-play approach that:(1) mitigates LLM fundamental deficiencies via external tool integration;(2) conducts explicit length modeling with dynamically inserted markers;(3) employs a three-stage generation scheme to better align length constraints while maintaining content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From Sub-Ability Diagnosis to Human-Aligned Generation: Bridging the Gap for Text Length Control via MarkerGen· underline

Taxonomy

TopicsNatural Language Processing Techniques · Handwritten Text Recognition Techniques

MethodsALIGN · Focus