ABD: Default Exception Abduction in Finite First Order Worlds

Serafim Batzoglou

arXiv:2602.18843·cs.AI·May 5, 2026

ABD: Default Exception Abduction in Finite First Order Worlds

Serafim Batzoglou

PDF

TL;DR

ABD is a benchmark for default-exception abduction in finite first-order worlds, evaluating language models' ability to generate sparse exception formulas under various observation regimes.

Contribution

Introduces ABD benchmark with formal regimes and SMT verification, assessing LLMs' ability to generate sparse exception formulas in first-order logic.

Findings

01

Top models achieve high validity in generating formulas.

02

Parsimony gaps indicate difficulty in producing minimal explanations.

03

Different regimes expose distinct generalization failure modes.

Abstract

We introduce ABD, a benchmark for default-exception abduction over finite first-order worlds. Given a background theory with an abnormality predicate and a set of relational structures, a model must output a first-order formula that defines exceptions, restoring satisfiability while keeping exceptions sparse. We formalize three observation regimes (closed-world, existential completion, universal completion) with exact SMT verification. Evaluating ten frontier LLMs on 600 instances, the best models achieve high validity but parsimony gaps remain, and holdout evaluation reveals distinct generalization failure modes across regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.