FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large   Language Models

Yiyuan Li; Shichao Sun; Pengfei Liu

arXiv:2407.01046·cs.AI·July 4, 2024

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models

Yiyuan Li, Shichao Sun, Pengfei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces FRoG, a benchmark for evaluating fuzzy reasoning in large language models using real-world mathematical problems with generalized quantifiers, revealing current limitations and inverse scaling effects.

Contribution

The paper presents FRoG, the first benchmark specifically designed to assess fuzzy reasoning in LLMs, highlighting their challenges and the disconnect with mathematical reasoning skills.

Findings

01

LLMs struggle with fuzzy reasoning tasks in FRoG.

02

Existing reasoning enhancement methods do not reliably improve performance.

03

Performance decreases as models scale up, showing inverse scaling effects.

Abstract

Fuzzy reasoning is vital due to the frequent use of imprecise information in daily contexts. However, the ability of current large language models (LLMs) to handle such reasoning remains largely uncharted. In this paper, we introduce a new benchmark, FRoG, for fuzzy reasoning, featuring real-world mathematical word problems that incorporate generalized quantifiers. Our experimental findings reveal that fuzzy reasoning continues to pose significant challenges for LLMs. Moreover, we find that existing methods designed to enhance reasoning do not consistently improve performance in tasks involving fuzzy logic. Additionally, our results show an inverse scaling effect in the performance of LLMs on FRoG. Interestingly, we also demonstrate that strong mathematical reasoning skills are not necessarily indicative of success on our benchmark.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nativeatom/frog
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques