TL;DR
TRIM introduces a stepwise routing approach for multi-step reasoning tasks, selectively deploying larger models only for critical steps to improve efficiency and accuracy.
Contribution
It proposes a novel step-level routing strategy that enhances inference efficiency by targeting only problematic reasoning steps for larger models.
Findings
TRIM achieves 5x higher cost efficiency on MATH-500.
Advanced policies match performance of expensive models with 80% fewer tokens.
Method generalizes well across different math reasoning benchmarks.
Abstract
Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical stepsthose likely to derail the solutionto larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
