Playing Psychic: Using Thought Trees to Predict Reasoning Models Accuracy on Coding Tasks

Jiaxin Fang; Runyuan He; Sahil Bhatia; Neel Gajare; Alvin Cheung

arXiv:2604.16931·cs.AI·April 21, 2026

Playing Psychic: Using Thought Trees to Predict Reasoning Models Accuracy on Coding Tasks

Jiaxin Fang, Runyuan He, Sahil Bhatia, Neel Gajare, Alvin Cheung

PDF

TL;DR

This paper investigates how the structure of reasoning traces in large language models affects their accuracy on coding tasks, proposing thought-trees to improve prediction and reliability.

Contribution

It introduces a method to generate diverse coding tasks, analyzes reasoning trace structures, and develops thought-trees and classifiers to predict and enhance model correctness.

Findings

01

Trace structure strongly predicts correctness.

02

Flagging anomalous traces improves accuracy.

03

Thought-trees enable better prediction of reasoning success.

Abstract

Recent advances in large language models (LLMs) have shown that test-time scaling can substantially improve model performance on complex tasks, particularly in the coding domain. Under this paradigm, models use a larger token budget during inference to generate intermediate reasoning traces before producing a final answer. However, current evaluations primarily rely on competitive programming benchmarks, which may not capture the full range of reasoning abilities. In this work, we perform a systematic study of frontier reasoning models to understand their performance on real-world coding benchmarks. To gain more insights into the performance of such models, we devise a programmatic way to {\em automatically generate} coding tasks of arbitrary difficulty and structure from existing benchmarks. Using this framework, our analysis reveals that the structure of a reasoning trace, not just…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.