The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov; Md Mahadi Hassan Sibat; Santu Karmaker; Aashish Yadavally

arXiv:2604.20917·cs.LG·April 24, 2026

The Path Not Taken: Duality in Reasoning about Program Execution

Eshgin Hasanov, Md Mahadi Hassan Sibat, Santu Karmaker, Aashish Yadavally

PDF

TL;DR

This paper introduces DexBench, a benchmark with dual reasoning tasks to evaluate large language models' understanding of program execution, focusing on causal comprehension beyond surface patterns.

Contribution

It proposes a novel duality-based evaluation framework and benchmark to better assess models' dynamic code reasoning capabilities.

Findings

01

Dual-path reasoning correlates with better code understanding.

02

13 LLMs evaluated show varying performance on the dual tasks.

03

DexBench effectively discriminates models' causal reasoning abilities.

Abstract

Large language models (LLMs) have shown remarkable capabilities across diverse coding tasks. However, their adoption requires a true understanding of program execution rather than relying on surface-level patterns. Existing benchmarks primarily focus on predicting program properties tied to specific inputs (e.g., code coverage, program outputs). As a result, they provide a narrow view of dynamic code reasoning and are prone to data contamination. We argue that understanding program execution requires evaluating its inherent duality through two complementary reasoning tasks: (i) predicting a program's observed behavior for a given input, and (ii) inferring how the input must be mutated toward a specific behavioral objective. Both tasks jointly probe a model's causal understanding of execution flow. We instantiate this duality in DexBench, a benchmark comprising 445 paired instances, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.