FOL-Traces: Verified First-Order Logic Reasoning Traces at Scale

Isabelle Lee; Sarah Liaw; Dani Yogatama

arXiv:2505.14932·cs.AI·January 27, 2026

FOL-Traces: Verified First-Order Logic Reasoning Traces at Scale

Isabelle Lee, Sarah Liaw, Dani Yogatama

PDF

Open Access 1 Video

TL;DR

FOL-Traces is a large-scale, verified dataset for evaluating structured logical inference in language models, addressing previous limitations of unverifiable traces and small datasets.

Contribution

The paper introduces FOL-Traces, the first large-scale, programmatically verified dataset for logical reasoning, along with diagnostic tasks to evaluate model inference fidelity.

Findings

01

Models achieve around 45.7% accuracy on masked operation prediction.

02

Models reach about 27% accuracy on two-step completion.

03

FOL-Traces remains a challenging benchmark for reasoning models.

Abstract

Reasoning in language models is difficult to evaluate: natural-language traces are unverifiable, symbolic datasets are too small, and most benchmarks conflate heuristics with inference. We present FOL-Traces, the first large-scale dataset of programmatically verified reasoning traces, enabling rigorous evaluation of structured logical inference. We also propose two challenging and comprehensive diagnostic tasks-masked operation prediction and step completion-that directly probe syntactic awareness and process fidelity. FOL-Traces serves as a scalable testbed for rigorously studying how models perform structured logical inference. Systematic experiments with 5 reasoning LLMs show that the dataset remains challenging: models only reach around 45.7% accuracy on masked operation prediction and around 27% on two-step completion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FOL-Traces: Verified First-Order Logic Reasoning Traces at Scale· underline

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Advanced Algebra and Logic · Logic, programming, and type systems