TL;DR
This paper introduces SLIM, a framework that identifies and eliminates suboptimal reasoning components in large language models, improving inference efficiency and accuracy on challenging benchmarks.
Contribution
SLIM's '5+2' framework systematically detects and removes subpar reasoning subtrajectories, enhancing model performance with less training data and under resource constraints.
Findings
Reduces suboptimal subtrajectories by 25.9% during inference.
Achieves 58.92% accuracy on math benchmarks with two-thirds training data.
Outperforms models trained on full datasets and open-source benchmarks.
Abstract
In recent months, substantial progress has been made in complex reasoning of Large Language Models, particularly through the application of test-time scaling. Notable examples include o1/o3/o4 series and DeepSeek-R1. When responding to a query, these models generate an extended reasoning trajectory, during which the model explores, reflects, backtracks, and self-verifies before arriving at a conclusion. However, fine-tuning models with such reasoning trajectories may not always be optimal. Our findings indicate that not all components within these reasoning trajectories contribute positively to the reasoning process; in fact, some components may affect the overall performance negatively. In this study, we divide a reasoning trajectory into individual subtrajectories and develop a "5+2" framework to: (1) systematically identify suboptimal subtrajectories within the reasoning trajectory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
