Language Models can Self-Lengthen to Generate Long Texts
Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei, Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, Junyang Lin

TL;DR
This paper introduces Self-Lengthen, an iterative training framework enabling large language models to generate longer, more aligned texts without auxiliary data, outperforming existing methods on benchmarks and human evaluations.
Contribution
The paper presents a novel Self-Lengthen framework that leverages intrinsic model capabilities to improve long-text generation without auxiliary data or proprietary models.
Findings
Outperforms existing methods on benchmarks.
Effective in generating longer, aligned texts.
Applicable to open-source LLMs like Qwen2 and LLaMA3.
Abstract
Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to process long contexts, yet a notable gap remains in generating long, aligned outputs. This limitation stems from a training gap where pre-training lacks effective instructions for long-text generation, and post-training data primarily consists of short query-response pairs. Current approaches, such as instruction backtranslation and behavior imitation, face challenges including data quality, copyright issues, and constraints on proprietary model usage. In this paper, we introduce an innovative iterative training framework called Self-Lengthen that leverages only the intrinsic knowledge and skills of LLMs without the need for auxiliary data or proprietary models. The framework consists of two roles: the Generator and the Extender. The Generator produces the initial response, which is then…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. Self-Lengthen is cost-effective and easy to use. It only requires a set of seed instructions for long-text output tasks and an open-source instruction model to automatically enhance the model's ability to generate long-text outputs. 2. This paper proposes a two-stage extension method that ensures the extension does not end normally. This creates space for the model to seamlessly connect preceding and succeeding segments, thereby enhancing its ability to complete extension tasks. 3. The auth
1. The paper adopts a rule-based approach to filter out invalid responses to ensure their quality. How is 'frequent repetition' specifically determined? Is it based on rules or scored by a more advanced LLM? If the extended responses merely describe the same meaning in different styles, is this kind of extension meaningful? 2. During the process of instruction evolution, some instruction data are inherently unsuitable for this type of evolution. For example, if the response to an instruction is
- This paper proposes a new method to improve language models’ long-form generation performance. - This topic is relevant to a wide range of applications which are bottlenecked by the response length that language models can reliably output.
- The proposed method contains a seemingly arbitrary decision of truncating the response to ½ or ⅔ for further extension. It is unclear why these cutoffs were chosen and how they compare to other cutoffs. - The proposed method utilized surface form heuristics (e.g. length, repetition) to ensure the quality of extended responses, while the semantic content is not quality assured. It is unclear if training on synthetic self-generated data hurts other LM capabilities, e.g., math/code reasoning and
1. Compared to the previous method, Self-Lengthen has no need for auxiliary data or powerful proprietary models, and supports outputs with more diverse styles and types. 2. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation when applied to top open-source LLMs such as Qwen2 and LLaMA3.
1. The design of LonGen benchmark is too similar to the benchmark in LongWriter (i.e., LongBench-Write), and many tables (e.g., Table 2, 3) and figures (e.g., Fig 5, 6) are similar to those in LongWriter without proper citations. The authors should give a more detailed explanation and comparison. 2. There are many missing details in the experiments, including the calculation method of distinct scores, the training data statistics, and the supported maximum output length. 3. The length control
1, The proposed Self-Lengthen framework introduces a unique iterative approach to improve long-text generation by utilizing the intrinsic capabilities of LLMs without relying on additional external datasets or proprietary models. 2, The method is simple yet practical, focusing on leveraging existing models' capabilities through iterative extension. This makes the method easy to implement and potentially scalable to various domains where long-text generation is required. 3, The authors conduct
1. **Motivation**: The motivation for this work is not sufficiently compelling. Regarding the instruction backtranslation method, SlimPajama already provides a large amount of long-text data generated by real-world [1]. For the behavior imitation approach, it is difficult to agree that there is a significant difference between using GPT-4 and open-source models. The LongWrite's agentwrite method can also use open-source models to generate data, which undermines the claimed uniqueness of Self-Len
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
