CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large   Language Models over Factual Knowledge

Tianshi Zheng; Jiaxin Bai; Yicheng Wang; Tianqing Fang; Yue Guo,; Yauwai Yim; Yangqiu Song

arXiv:2407.20564·cs.CL·July 31, 2024·1 cites

CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

Tianshi Zheng, Jiaxin Bai, Yicheng Wang, Tianqing Fang, Yue Guo,, Yauwai Yim, Yangqiu Song

PDF

Open Access

TL;DR

This paper introduces a new benchmark to evaluate large language models' complex logical reasoning over general and biomedical knowledge, revealing strengths in general knowledge and challenges in specialized domains, especially with set intersections.

Contribution

It presents a systematic evaluation framework and benchmark for assessing LLMs' complex logical reasoning, highlighting their performance gaps and improvements with Chain-of-Thought prompting.

Findings

01

LLMs excel at reasoning over general knowledge.

02

LLMs struggle with domain-specific knowledge, especially in biomedical fields.

03

Chain-of-Thought prompting significantly improves reasoning performance.

Abstract

While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities through a novel benchmark of automatically generated complex reasoning questions over general domain and biomedical knowledge graphs. Our extensive experiments, employing diverse in-context learning techniques, reveal that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge. We find that prompting with explicit Chain-of-Thought demonstrations can substantially improve LLM performance on complex logical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management · Natural Language Processing Techniques

MethodsSparse Evolutionary Training