ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Yuki Imajuku; Kohki Horie; Yoichi Iwata; Kensho Aoki; Naohiro Takahashi; Takuya Akiba

arXiv:2506.09050·cs.AI·October 7, 2025

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, Takuya Akiba

PDF

Open Access 1 Repo 1 Datasets

TL;DR

ALE-Bench is a new benchmark designed to evaluate AI systems' ability to solve complex, long-horizon optimization problems in domains like routing and scheduling, emphasizing iterative refinement and real-world applicability.

Contribution

It introduces a benchmark based on real, hard optimization tasks from AtCoder contests, supporting interactive, feedback-driven AI solutions for long-term problem solving.

Findings

01

LLMs perform well on specific problems but lack consistency.

02

Significant gap between AI and human performance in long-horizon tasks.

03

Benchmark encourages development of AI with better iterative and feedback capabilities.

Abstract

How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sakanaai/ale-bench
tfOfficial

Datasets

SakanaAI/ALE-Bench
dataset· 4.1k dl
4.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · AI-based Problem Solving and Planning · Multimodal Machine Learning Applications