When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
Md Fahim Anjum

TL;DR
This paper demonstrates that a small, 1.5B parameter reasoning model can outperform larger non-reasoning LLMs as a discriminator in text-to-SQL tasks, highlighting the effectiveness of reasoning capabilities in evaluation tasks.
Contribution
The study introduces a novel method for extracting soft scores from chain-of-thought outputs and benchmarks a distilled reasoning model against larger non-reasoning models, revealing its superior discrimination performance.
Findings
DeepSeek-R1-1.5B outperforms larger non-reasoning LLMs in F1 and discrimination accuracy.
Reasoning models face limitations in logical capabilities despite increased context or compute.
Reasoning models are more effective as discriminators than as generators in LLM planning.
Abstract
Large Language Models (LLM) with reasoning capabilities offer a promising path for improving candidate evaluation in planning frameworks, but their relative performance against traditional non-reasoning models remains largely underexplored. In this study, we benchmark a distilled 1.5B parameter reasoning model (DeepSeek-R1) against several state-of-the-art non-reasoning LLMs within a generator-discriminator LLM planning framework for the text-to-SQL task. For this, we introduce a novel method for extracting soft scores from the chain-of-thought (CoT) outputs from reasoning that enables fine-grained ranking of candidates. Our central hypothesis is that reasoning models are more effective discriminators than non-reasoning LLMs. Our results show that distilled DeepSeek-R1-1.5B achieves up to higher F1 and better discrimination accuracy than CodeLlama-7B, as well as …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
