Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons

Chi Chiu So; Yueyue Sun; Jun-Min Wang; Siu Pang Yung; Anthony Wai Keung Loh; Chun Pong Chau

arXiv:2506.23128·cs.AI·July 1, 2025

Are Large Language Models Capable of Deep Relational Reasoning? Insights from DeepSeek-R1 and Benchmark Comparisons

Chi Chiu So, Yueyue Sun, Jun-Min Wang, Siu Pang Yung, Anthony Wai Keung Loh, Chun Pong Chau

PDF

Open Access

TL;DR

This paper evaluates the reasoning capabilities of large language models on complex relational tasks, revealing strengths in logical inference but also significant limitations as problem complexity increases, highlighting areas for future improvement.

Contribution

It introduces a benchmark suite for deep relational reasoning and provides a comparative analysis of DeepSeek-R1, DeepSeek-V3, and GPT-4o, highlighting their strengths and weaknesses.

Findings

01

DeepSeek-R1 achieves the highest F1-scores across tasks.

02

All models struggle with increased problem complexity.

03

Long Chain-of-Thought responses reveal planning and verification strategies.

Abstract

How far are Large Language Models (LLMs) in performing deep relational reasoning? In this paper, we evaluate and compare the reasoning capabilities of three cutting-edge LLMs, namely, DeepSeek-R1, DeepSeek-V3 and GPT-4o, through a suite of carefully designed benchmark tasks in family tree and general graph reasoning. Our experiments reveal that DeepSeek-R1 consistently achieves the highest F1-scores across multiple tasks and problem sizes, demonstrating strong aptitude in logical deduction and relational inference. However, all evaluated models, including DeepSeek-R1, struggle significantly as problem complexity increases, largely due to token length limitations and incomplete output structures. A detailed analysis of DeepSeek-R1's long Chain-of-Thought responses uncovers its unique planning and verification strategies, but also highlights instances of incoherent or incomplete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)