RECAST: Expanding the Boundaries of LLMs' Complex Instruction Following with Multi-Constraint Data
Zhengkang Guo, Wenhao Liu, Mingchen Xie, Jingwen Xu, Zisu Huang, Muzhao Tian, Jianhan Xu, Yuanzhe Shen, Qi Qian, Muling Wu, Xiaohua Wang, Changze Lv, He-Da Wang, Hu Yao, Xiaoqing Zheng, Xuanjing Huang

TL;DR
This paper introduces RECAST, a scalable framework for creating large datasets with numerous constraints to improve LLMs' ability to follow complex instructions, surpassing previous limitations.
Contribution
We propose RECAST, a novel method for synthesizing datasets with over 10 constraints per instance, significantly expanding the complexity of instruction-following benchmarks for LLMs.
Findings
Models trained on RECAST-30K show improved complex instruction following.
RECAST enables automatic verification of constraints, facilitating reinforcement learning.
Fine-tuning on RECAST-30K maintains general model capabilities.
Abstract
Large language models (LLMs) are increasingly expected to tackle complex tasks, driven by their expanding applications and users' growing proficiency in crafting sophisticated prompts. However, as the number of explicitly stated requirements increases (particularly more than 10 constraints), LLMs often struggle to accurately follow such complex instructions, which limits their applicability in complex real-world scenarios. To the best of our knowledge, existing datasets do not exceed 10 constraints per instance. To address this challenge, we propose RECAST, an efficient and scalable framework for synthesizing datasets where each example incorporates far more constraints than those in existing benchmarks, aiming to challenge and extend the boundaries of models' ability to follow complex instructions. These constraints are extracted from real-world prompt-response pairs to ensure…
Peer Reviews
Decision·ICLR 2026 Poster
- **Clear motivation and problem:** The paper targets a real gap, LLMs’ performance drops as the number of explicit constraints increases, and existing SFT datasets usually don’t cover a high number of constraints. - **The pipeline is complete and reproducible:** The paper details each stage (constraint extraction, instruction enhancement, response synthesis, validation), gives prompts/templates, and reports human agreement metrics. RLVC is specified with GRPO objective and training setup. Thi
- **Lack of theoretical positioning against RLVR methods:** The paper does not situate its RL formulation within the growing line of Reinforcement Learning from Verifiable Rewards (RLVR) research, despite clear conceptual overlap. Despite the original paper of RLVR being cited [1], it is cited only in the context of the dataset. Other foundational RLVR works [2,3] and subsequent works applying the verifiable reward mechanisms to multi-constraint or format-constrained instruction-following [4,5]
This submission has the following strengths: - The paper demonstrates clear writing and a well-structured organization. - The proposed dataset RECAST-30K is large in scale and contains an adequate number of constraints. - Experimental results have shown that training with RECAST can effectively enhance large language model's ability in instruction following.
This submission has the following weaknesses: - For model-based constraints, the quality depends on used large language models. - The count of Rule-based constraints are much less than model-based constraints.
1. The framework incorporates an extended number of constraints in a single instruction, which facilitates the evaluation and improvement of LLMs' capability of following more complex instructions as the tasks for LLMs are becoming increasingly complicated. 2. The framework is automated and scalable, providing an efficient approach of synthesizing datasets with multiple constraints in instructions. 3. LLMs trained on data from this framework exhibit improved performance and good generalization o
1. Some model-based constraints may not suitable for a binary evaluation. For example, for the "Helpfulness" constraint, it is common for LLMs to generate two helpful responses but with discrepant levels. If both of them are judged as satisfying the constraint, the gap between the two responses will be eliminated, which is not conducive to fine-grained model performance optimization. 2. The effectiveness of RLVC on reasoning models are not validated. Both Qwen-2.5-7B and Llama-3.1-8B are non-rea
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
