AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming

Jierui Li; Raymond Mooney

arXiv:2507.15378·cs.CL·July 22, 2025

AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming

Jierui Li, Raymond Mooney

PDF

TL;DR

This paper introduces AlgoSimBench, a benchmark for testing LLMs' ability to identify algorithmically similar problems in competitive programming, revealing current limitations and proposing methods to improve detection accuracy.

Contribution

The paper presents AlgoSimBench, a new benchmark with annotated problems and a novel solution matching method to enhance algorithmic similarity detection in LLMs.

Findings

01

LLMs achieve only 65.9% accuracy on ASP identification.

02

Attempted solution matching improves accuracy by 6.7% to 11.7%.

03

Combining ASM with BM25 yields up to 52.2% accuracy.

Abstract

Recent progress in LLMs, such as reasoning models, has demonstrated strong abilities to solve complex competitive programming problems, often rivaling top human competitors. However, it remains underexplored whether these abilities generalize to relevant domains that are less seen during training. To address this, we introduce AlgoSimBench, a new benchmark designed to assess LLMs' ability to identify algorithmically similar problems (ASPs)-problems that can be solved using similar algorithmic approaches. AlgoSimBench consists of 1317 problems, annotated with 231 distinct fine-grained algorithm tags, from which we curate 402 multiple-choice questions (MCQs), where each question presents one algorithmically similar problem alongside three textually similar but algorithmically dissimilar distractors. Our evaluation reveals that LLMs struggle to identify ASPs, with the best-performing model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.