P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains
Simeng Han, Aaron Yu, Rui Shen, Zhenting Qi, Martin Riddell, Wenfei, Zhou, Yujie Qiao, Yilun Zhao, Semih Yavuz, Ye Liu, Shafiq Joty, Yingbo Zhou,, Caiming Xiong, Dragomir Radev, Rex Ying, Arman Cohan

TL;DR
P-FOLIO is a human-annotated dataset of complex, step-by-step logical reasoning chains designed to evaluate and enhance the reasoning capabilities of large language models, surpassing previous binary or synthetic approaches.
Contribution
The paper introduces P-FOLIO, a detailed dataset of human-written reasoning chains for first-order logic, and demonstrates how it improves LLM reasoning through fine-tuning and multi-step evaluation.
Findings
Human-written reasoning chains boost LLM reasoning performance.
Fine-tuning Llama3-7B improves accuracy by over 10% on out-of-domain datasets.
Diverse and complex reasoning chains reveal specific reasoning shortcomings in LLMs.
Abstract
Existing methods on understanding the capabilities of LLMs in logical reasoning rely on binary entailment classification or synthetically derived rationales, which are not sufficient for proper investigation of model's capabilities. We present P-FOLIO, a human-annotated dataset consisting of diverse and complex reasoning chains for a set of realistic logical reasoning stories also written by humans. P-FOLIO is collected with an annotation protocol that facilitates humans to annotate well-structured natural language proofs for first-order logic reasoning problems in a step-by-step manner. The number of reasoning steps in P-FOLIO span from 0 to 20. We further use P-FOLIO to evaluate and improve large-language-model (LLM) reasoning capabilities. We evaluate LLM reasoning capabilities at a fine granularity via single-step inference rule classification, with more diverse inference rules of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
MethodsSparse Evolutionary Training
