LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning

Shenghao Li

arXiv:2511.03372·cs.CL·November 6, 2025

LFC-DA: Logical Formula-Controlled Data Augmentation for Enhanced Logical Reasoning

Shenghao Li

PDF

Open Access

TL;DR

LFC-DA introduces a symbolic-logic-controlled data augmentation pipeline that enhances logical reasoning in models by systematically generating diverse, logically rigorous natural language questions through propositional logic and rule-based search.

Contribution

It presents a novel, interpretable data augmentation method that combines symbolic logic with large language models to improve logical reasoning accuracy.

Findings

01

Significant accuracy improvements on ReClor and LogiQA datasets.

02

Effective generation of diverse, logically rigorous questions.

03

Demonstrates the benefit of logic-controlled augmentation for LLMs.

Abstract

For complex logical data augmentation, heavy reliance on human annotation is costly, whereas direct generation with large language models yields uninterpretable and logically homogeneous examples. To address this, we present LFC-DA, a symbolic-logic-controlled pipeline: logical text is first mapped to propositional expressions, a compact rule library is compiled, and a bounded state-space search systematically discovers valid formulas that are then verbalized back into natural-language questions, ensuring both diversity and logical rigor under propositional logic. Experiments on ReClor and LogiQA show significant improvements in the logical-reasoning accuracy of pretrained models, confirming the effectiveness of LFC-DA for LLM-guided logical data augmentation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications