Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive   Declarative Grammars

Damien Sileo

arXiv:2406.11035·cs.CL·June 18, 2024

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars

Damien Sileo

PDF

1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces a flexible declarative framework for generating synthetic logical reasoning datasets, improving model training and achieving state-of-the-art accuracy on logic tasks with smaller models.

Contribution

It presents a general, context-sensitive rule-based approach for dataset generation that enhances reasoning capabilities and extends to multiple languages, surpassing prior domain-specific methods.

Findings

01

Achieved state-of-the-art accuracy on FOLIO dataset with small DeBERTa-v3 models.

02

Semantic constraints and verbalization improve logical reasoning performance.

03

Outperformed GPT-4 in accuracy by 12% on logic tasks.

Abstract

Logical reasoning remains a challenge for natural language processing, but it can be improved by training language models to mimic theorem provers on procedurally generated problems. Previous work used domain-specific proof generation algorithms, which biases reasoning toward specific proof traces and limits auditability and extensibility. We present a simpler and more general declarative framework with flexible context-sensitive rules binding multiple languages (specifically, simplified English and the TPTP theorem-proving language). We construct first-order logic problems by selecting up to 32 premises and one hypothesis. We demonstrate that using semantic constraints during generation and careful English verbalization of predicates enhances logical reasoning without hurting natural English tasks. We use relatively small DeBERTa-v3 models to achieve state-of-the-art accuracy on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sileod/unigram
noneOfficial

Datasets

tasksource/FOL-nli
dataset· 54 dl
54 dl

Videos

Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars· underline

Taxonomy

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer