Can LLMs Solve ASP Problems? Insights from a Benchmarking Study (Extended Version)
Lin Ren, Guohui Xiao, Guilin Qi, Yishuai Geng, and Haohan Xue

TL;DR
This paper introduces ASPBench, a comprehensive benchmark for evaluating large language models on Answer Set Programming tasks, revealing their strengths and limitations in ASP solving and highlighting the need for more integrated reasoning approaches.
Contribution
The paper presents ASPBench, a new benchmark with three ASP-specific tasks, and provides extensive evaluation of LLMs, exposing their challenges in answer set computation.
Findings
LLMs perform well on simple ASP tasks like entailment and verification.
LLMs struggle significantly with answer set computation.
Current LLMs have limitations in core ASP solving capabilities.
Abstract
Answer Set Programming (ASP) is a powerful paradigm for non-monotonic reasoning. Recently, large language models (LLMs) have demonstrated promising capabilities in logical reasoning. Despite this potential, current evaluations of LLM capabilities in ASP are often limited. Existing works normally employ overly simplified ASP programs, do not support negation, disjunction, or multiple answer sets. Furthermore, there is a lack of benchmarks that introduce tasks specifically designed for ASP solving. To bridge this gap, we introduce ASPBench, a comprehensive ASP benchmark, including three ASP specific tasks: ASP entailment, answer set verification, and answer set computation. Our extensive evaluations on ASPBench reveal that while 14 state-of-the-art LLMs, including \emph{deepseek-r1}, \emph{o4-mini}, and \emph{gemini-2.5-flash-thinking}, perform relatively well on the first two simpler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
