TL;DR
SAGE-32B is a 32-billion-parameter language model designed for agentic reasoning, utilizing iterative distillation and inverse reasoning to improve task decomposition, tool use, and error recovery in long-range planning tasks.
Contribution
The paper introduces SAGE-32B, a large language model trained with iterative distillation and inverse reasoning, specifically optimized for agentic reasoning and planning tasks.
Findings
SAGE-32B outperforms baseline models on agentic reasoning benchmarks.
The model demonstrates improved multi-tool usage success rates.
It maintains competitive performance on standard reasoning tasks.
Abstract
We demonstrate SAGE-32B, a 32 billion parameter language model that focuses on agentic reasoning and long range planning tasks. Unlike chat models that aim for general conversation fluency, SAGE-32B is designed to operate in an agentic loop, emphasizing task decomposition, tool usage, and error recovery. The model is initialized from the Qwen2.5-32B pretrained model and fine tuned using Iterative Distillation, a two stage training process that improves reasoning performance through rigorously tested feedback loops. SAGE-32B also introduces an inverse reasoning approach, which uses a meta cognition head to forecast potential failures in the planning process before execution. On agentic reasoning benchmarks including MMLU-Pro, AgentBench, and MATH-500, SAGE-32B achieves higher success rates in multi tool usage scenarios compared to similarly sized baseline models, while remaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
