TL;DR
SPRINT is a framework that enables large reasoning models to identify independent subtasks and execute them in parallel, significantly reducing inference time while maintaining performance on complex reasoning tasks.
Contribution
The paper introduces SPRINT, a post-training framework that reorganizes reasoning trajectories for parallel execution, improving inference efficiency in large reasoning models.
Findings
Up to 39% reduction in sequential tokens on complex tasks.
Maintains performance while enabling parallel reasoning.
Effective transfer to out-of-distribution tasks with significant token reduction.
Abstract
Large reasoning models (LRMs) excel at complex reasoning tasks but typically generate lengthy sequential chains-of-thought, resulting in long inference times before arriving at the final answer. To address this challenge, we introduce SPRINT, a novel post-training and inference-time framework designed to enable LRMs to dynamically identify and exploit opportunities for parallelization during their reasoning process. SPRINT incorporates an innovative data curation pipeline that reorganizes natural language reasoning trajectories into structured rounds of long-horizon planning and parallel execution. By fine-tuning LRMs on a small amount of such curated data, the models learn to dynamically identify independent subtasks within extended reasoning processes and effectively execute them in parallel. Through extensive evaluations, we demonstrate that models fine-tuned with the SPRINT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
