UniCode: Augmenting Evaluation for Code Reasoning

Xinyue Zheng; Haowei Lin; Shaofei Cai; Zilong Zheng; Yaodong Yang; Yitao Liang

arXiv:2510.17868·cs.SE·February 17, 2026

UniCode: Augmenting Evaluation for Code Reasoning

Xinyue Zheng, Haowei Lin, Shaofei Cai, Zilong Zheng, Yaodong Yang, Yitao Liang

PDF

Open Access

TL;DR

UniCode is a comprehensive evaluation framework that challenges large language models with complex, augmented code reasoning tasks, revealing their limitations in conceptual understanding and scalability.

Contribution

The paper introduces UniCode, a novel generative evaluation framework that systematically probes LLMs with augmented problems and fine-grained metrics, exposing reasoning weaknesses.

Findings

01

Models show a 31.2% performance drop on UniCode.

02

Models rely on memorized seed logic rather than reasoning.

03

UniCode effectively exposes model fragility in code reasoning.

Abstract

Current coding benchmarks often inflate Large Language Model (LLM) capabilities due to static paradigms and data contamination, enabling models to exploit statistical shortcuts rather than genuine reasoning. To address this, we introduce UniCode, a generative evaluation framework that systematically probes LLM limits via: (1) multi-dimensional augmentation transforming seed problems into complex variations to disrupt fixed algorithmic patterns; (2) a highly reliable, automated test generation pipeline for scalable evaluation; and (3) fine-grained metrics for rich error signals. Experiments reveal a 31.2% performance collapse in state-of-the-art models on UniCode, primarily driven by deficiencies in conceptual modeling and scalability reasoning rather than syntactic errors. Furthermore, we uncover a seed-problem regression where models revert to memorized seed logic rather than following…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research