ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, Takuya Akiba

TL;DR
ALE-Bench is a new benchmark designed to evaluate AI systems' ability to solve complex, long-horizon optimization problems in domains like routing and scheduling, emphasizing iterative refinement and real-world applicability.
Contribution
It introduces a benchmark based on real, hard optimization tasks from AtCoder contests, supporting interactive, feedback-driven AI solutions for long-term problem solving.
Findings
LLMs perform well on specific problems but lack consistency.
Significant gap between AI and human performance in long-horizon tasks.
Benchmark encourages development of AI with better iterative and feedback capabilities.
Abstract
How well do AI systems perform in algorithm engineering for hard optimization problems in domains such as package-delivery routing, crew scheduling, factory production planning, and power-grid balancing? We introduce ALE-Bench, a new benchmark for evaluating AI systems on score-based algorithmic programming contests. Drawing on real tasks from the AtCoder Heuristic Contests, ALE-Bench presents optimization problems that are computationally hard and admit no known exact solution. Unlike short-duration, pass/fail coding benchmarks, ALE-Bench encourages iterative solution refinement over long time horizons. Our software framework supports interactive agent architectures that leverage test-run feedback and visualizations. Our evaluation of frontier LLMs revealed that while they demonstrate high performance on specific problems, a notable gap remains compared to humans in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · AI-based Problem Solving and Planning · Multimodal Machine Learning Applications
