Large Language Models Still Face Challenges in Multi-Hop Reasoning with   External Knowledge

Haotong Zhang

arXiv:2412.08317·cs.CL·December 12, 2024

Large Language Models Still Face Challenges in Multi-Hop Reasoning with External Knowledge

Haotong Zhang

PDF

Open Access

TL;DR

This paper evaluates GPT-3.5's multi-hop reasoning capabilities, revealing significant limitations in knowledge integration, non-sequential reasoning, and scalability, despite high performance on various benchmarks.

Contribution

It systematically assesses large language models' multi-hop reasoning abilities across multiple aspects, highlighting existing challenges and gaps compared to human reasoning.

Findings

01

GPT-3.5 struggles with multi-hop reasoning tasks

02

Models have difficulty generalizing to more complex, longer reasoning chains

03

Significant gap remains between model performance and human reasoning abilities

Abstract

We carry out a series of experiments to test large language models' multi-hop reasoning ability from three aspects: selecting and combining external knowledge, dealing with non-sequential reasoning tasks and generalising to data samples with larger numbers of hops. We test the GPT-3.5 model on four reasoning benchmarks with Chain-of-Thought prompting (and its variations). Our results reveal that despite the amazing performance achieved by large language models on various reasoning tasks, models still suffer from severe drawbacks which shows a large gap with humans.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Attention Dropout · Softmax · Cosine Annealing · Byte Pair Encoding · Linear Layer · Linear Warmup With Cosine Annealing · Multi-Head Attention