ConvexBench: Can LLMs Recognize Convex Functions?

Yepeng Liu; Yu Huang; Yu-Xiang Wang; Yingbin Liang; Yuheng Bu

arXiv:2602.01075·cs.AI·February 5, 2026

ConvexBench: Can LLMs Recognize Convex Functions?

Yepeng Liu, Yu Huang, Yu-Xiang Wang, Yingbin Liang, Yuheng Bu

PDF

Open Access 1 Datasets

TL;DR

ConvexBench is a new benchmark designed to evaluate whether Large Language Models can recognize convex functions, revealing significant reasoning limitations at high compositional depths and proposing an agentic divide-and-conquer approach to improve performance.

Contribution

The paper introduces ConvexBench, a scalable benchmark for testing LLMs' ability to identify convexity in deep compositions, and proposes an agentic framework to address reasoning failures.

Findings

01

LLMs' performance drops sharply with increasing depth in convexity recognition.

02

The proposed divide-and-conquer framework significantly improves reasoning accuracy at large depths.

03

Models exhibit parsing failure and lazy reasoning as primary failure modes.

Abstract

Convex analysis is a modern branch of mathematics with many applications. As Large Language Models (LLMs) start to automate research-level math and sciences, it is important for LLMs to demonstrate the ability to understand and reason with convexity. We introduce \cb, a scalable and mechanically verifiable benchmark for testing \textit{whether LLMs can identify the convexity of a symbolic objective under deep functional composition.} Experiments on frontier LLMs reveal a sharp compositional reasoning gap: performance degrades rapidly with increasing depth, dropping from an F1-score of $1.0$ at depth $2$ to approximately $0.2$ at depth $100$ . Inspection of models' reasoning traces indicates two failure modes: \textit{parsing failure} and \textit{lazy reasoning}. To address these limitations, we propose an agentic divide-and-conquer framework that (i) offloads parsing to an external tool…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

chiffonng/fatima-prework
dataset· 21 dl
21 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Natural Language Processing Techniques · Topic Modeling