When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents

Laksh Advani

arXiv:2601.00513·cs.LG·January 5, 2026

When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents

Laksh Advani

PDF

Open Access

TL;DR

This paper exposes a reliability crisis in small language models, where many correct answers are based on flawed reasoning, and introduces a process verification metric to improve trustworthiness.

Contribution

It introduces the Reasoning Integrity Score (RIS), a process-based metric validated across diverse tasks, and analyzes the effects of retrieval augmentation and meta-cognition on reasoning quality.

Findings

01

50-69% of correct answers contain flawed reasoning

02

Retrieval-augmented generation improves reasoning integrity significantly

03

Meta-cognitive interventions can harm performance in small models

Abstract

Deploying small language models (7-9B parameters) as autonomous agents requires trust in their reasoning, not just their outputs. We reveal a critical reliability crisis: 50-69\% of correct answers from these models contain fundamentally flawed reasoning -- a ``Right-for-Wrong-Reasons'' phenomenon invisible to standard accuracy metrics. Through analysis of 10,734 reasoning traces across three models and diverse tasks, we introduce the Reasoning Integrity Score (RIS), a process-based metric validated with substantial inter-rater agreement ( $κ = 0.657$ ). Conventional practices are challenged by our findings: while retrieval-augmented generation (RAG) significantly improves reasoning integrity (Cohen's $d = 0.23$ -- $0.93$ ), meta-cognitive interventions like self-critique often harm performance ( $d = - 0.14$ to $- 0.33$ ) in small models on the evaluated tasks. Mechanistic analysis reveals RAG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Ethics and Social Impacts of AI