Large Language Models are Algorithmically Blind
Sohan Venkatesh, Ashish Mahendran Kurapath, Tejas Melkote

TL;DR
This paper investigates the reasoning capabilities of large language models regarding algorithms, revealing widespread failure and a fundamental gap in their understanding of computational processes.
Contribution
It introduces the concept of algorithmic blindness in LLMs and empirically demonstrates their systematic failure to reason about algorithms.
Findings
Most models perform worse than random guessing.
Predicted confidence intervals are wider than true intervals.
Best model's marginal gains are due to memorization, not reasoning.
Abstract
Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and deployment. We address this limitation using causal discovery as a testbed and evaluate eight frontier LLMs against ground truth derived from algorithm executions. We find systematic, near-total failure across models. The predicted ranges are far wider than true confidence intervals yet still fail to contain the true algorithmic mean in most cases. Most models perform worse than random guessing and the best model's marginal improvement is attributable to benchmark memorization rather than principled reasoning. We term this failure algorithmic blindness and argue it reflects a fundamental gap between declarative knowledge about algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
