Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation
Chengbing Wang, Yang Zhang, Wenjie Wang, Xiaoyan Zhao, Fuli Feng, Xiangnan He, Tat-Seng Chua

TL;DR
FlyThinker introduces a concurrent reasoning and generation framework for personalized long-form content, improving adaptability and efficiency over static methods by reasoning in parallel during response creation.
Contribution
It presents FlyThinker, a novel on-the-fly reasoning approach that enables dynamic, parallel reasoning during long-form generation for personalized responses, enhancing efficiency and adaptability.
Findings
Achieves better personalized long-form generation results.
Maintains training and inference efficiency.
Outperforms static reasoning methods on benchmarks.
Abstract
Preference alignment has enabled large language models (LLMs) to better reflect human expectations, but current methods mostly optimize for population-level preferences, overlooking individual users. Personalization is essential, yet early approaches-such as prompt customization or fine-tuning-struggle to reason over implicit preferences, limiting real-world effectiveness. Recent "think-then-generate" methods address this by reasoning before response generation. However, they face challenges in long-form generation: their static one-shot reasoning must capture all relevant information for the full response generation, making learning difficult and limiting adaptability to evolving content. To address this issue, we propose FlyThinker, an efficient "think-while-generating" framework for personalized long-form generation. FlyThinker employs a separate reasoning model that generates latent…
Peer Reviews
Decision·ICLR 2026 Poster
1. The “think-while-generating” paradigm is well-motivated, and the overall methodology is simple and intuitive. 2. The evaluation is comprehensive with multiple metrics, showing the effectiveness of the proposed method over baselines. 3. FlyThinker achieves training and inference efficiency comparable to SFT, which is a major advantage relative to existing reasoning-augmented methods that typically incur higher latency.
1. It is not clear whether the reported personalization results are based on the user-based split (testing on unseen users) or the temporal split (testing on later instances of seen users). Since these settings test different personalization abilities (cross-user vs. within-user), clarification or stratified results would make the findings more interpretable. 2. While the paper ablates on Reasoner size, it is not clear how the Reasoner scales relative to the Generator, e.g., what would be the s
- The proposed approach seems useful for efficiently combining reasoning and generation. It provides a simple way to align to user preferences. - The method is flexible and can be used with models of different sizes. It is particularly convenient that a small reasoning model can be combined with a larger generation model for greater efficiency. - The writing and presentation of the paper are clear. The method section in particular is very easy to read and clearly lays out the method.
While the proposed approach is interesting, the evaluation experiments in their current form are not sufficiently comprehensive. 1. The evaluation is based only on simple automated metrics (ROUGE, BLEU, METEOR, BERT-Score). To get a full understanding of how much FlyThinker improves personalization, it would be useful to have a user study or an automatic LLM evaluation of preferred personalized outputs. 2. The experiments are mostly limited to Qwen2.5-3B-Instruct. The small set of Qwen2.5-7B-I
1. The concept of generating reasoning tokens and response tokens simultaneously is novel and intriguing to me. 2. The idea of decoupling reasoning tokens from previously generated reasoning outputs is particularly interesting, as it enables a one-pass training process and significantly improves overall efficiency.
1. The title of Figure 3 is somewhat misleading and should be revised to “Training Efficiency / Inference Efficiency.” Although the proposed method demonstrates shorter runtime, it relies on two separate models—the reasoning model and the generation model—which substantially increases memory consumption. The authors should therefore provide a comparison of the actual computational cost against other baselines to present a fair assessment. 2. Reasoning models typically show the greatest advantage
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Mobile Crowdsensing and Crowdsourcing
