FOLIO: Natural Language Reasoning with First-Order Logic
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin, Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy, Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell,, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni

TL;DR
FOLIO is a new dataset with first-order logic annotations designed to evaluate and improve the logical reasoning capabilities of large language models in natural language understanding tasks.
Contribution
The paper introduces FOLIO, a human-annotated, logically complex dataset with FOL annotations, and benchmarks LLMs' reasoning abilities on it.
Findings
FOLIO challenges current state-of-the-art LLMs, including GPT-4.
FOLIO enables systematic evaluation of NL reasoning and translation.
Benchmark results highlight gaps in LLMs' logical reasoning skills.
Abstract
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsAttention Is All You Need · OPT · Linear Layer · Cosine Annealing · Dense Connections · Weight Decay · 15 Ways to Contact How can i speak to someone at Delta Airlines · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding
