Light-IF: Endowing LLMs with Generalizable Reasoning via Preview and Self-Checking for Complex Instruction Following
Chenyang Wang, Liang Wen, Shousheng Jia, Xiangzheng Zhang, Liang Xu

TL;DR
This paper introduces Light-IF, a framework that enhances large language models' reasoning and instruction adherence through preview, self-checking, and entropy-based training, leading to significant performance gains.
Contribution
The paper presents a novel framework combining prompt filtering, rejection sampling, and entropy-preserving fine-tuning to improve LLMs' generalizable reasoning and instruction-following capabilities.
Findings
Light-IF-32B outperforms larger models like DeepSeek-R1 and Doubao-1.6.
The approach significantly improves instruction adherence and reasoning accuracy.
Extensive experiments validate the effectiveness across multiple benchmarks.
Abstract
While advancements in the reasoning abilities of LLMs have significantly enhanced their performance in solving mathematical problems, coding tasks, and general puzzles, their effectiveness in accurately adhering to instructions remains inconsistent, particularly with more complex directives. Our investigation identifies lazy reasoning during the thinking stage as the primary factor contributing to poor instruction adherence. To mitigate this issue, we propose a comprehensive framework designed to enable rigorous reasoning processes involving preview and self-checking, essential for satisfying strict instruction constraints. Specifically, we first generate instructions with complex constraints and apply a filtering process to obtain valid prompts, resulting in three distinct prompt datasets categorized as hard, easy, and pass. Then, we employ rejection sampling on the pass prompts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
