When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as   Discriminator

Md Fahim Anjum

arXiv:2505.03786·cs.LG·May 8, 2025

When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator

Md Fahim Anjum

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that a small, 1.5B parameter reasoning model can outperform larger non-reasoning LLMs as a discriminator in text-to-SQL tasks, highlighting the effectiveness of reasoning capabilities in evaluation tasks.

Contribution

The study introduces a novel method for extracting soft scores from chain-of-thought outputs and benchmarks a distilled reasoning model against larger non-reasoning models, revealing its superior discrimination performance.

Findings

01

DeepSeek-R1-1.5B outperforms larger non-reasoning LLMs in F1 and discrimination accuracy.

02

Reasoning models face limitations in logical capabilities despite increased context or compute.

03

Reasoning models are more effective as discriminators than as generators in LLM planning.

Abstract

Large Language Models (LLM) with reasoning capabilities offer a promising path for improving candidate evaluation in planning frameworks, but their relative performance against traditional non-reasoning models remains largely underexplored. In this study, we benchmark a distilled 1.5B parameter reasoning model (DeepSeek-R1) against several state-of-the-art non-reasoning LLMs within a generator-discriminator LLM planning framework for the text-to-SQL task. For this, we introduce a novel method for extracting soft scores from the chain-of-thought (CoT) outputs from reasoning that enables fine-grained ranking of candidates. Our central hypothesis is that reasoning models are more effective discriminators than non-reasoning LLMs. Our results show that distilled DeepSeek-R1-1.5B achieves up to $87%$ higher F1 and $3.7%$ better discrimination accuracy than CodeLlama-7B, as well as $3.7%$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mdfahimanjum/llm-planning-with-reasoning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law