Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Xin Li; Weize Chen; Qizhi Chu; Haopeng Li; Zhaojun Sun; Ran Li; Chen Qian; Yiwei Wei; Zhiyuan Liu; Chuan Shi; Maosong Sun; Cheng Yang

arXiv:2409.19667·cs.CL·November 4, 2025

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models

Xin Li, Weize Chen, Qizhi Chu, Haopeng Li, Zhaojun Sun, Ran Li, Chen Qian, Yiwei Wei, Zhiyuan Liu, Chuan Shi, Maosong Sun, Cheng Yang

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper introduces ProGraph, a benchmark for evaluating large language models' ability to analyze graphs through programming tasks, revealing current limitations and proposing datasets and methods to improve their performance.

Contribution

The paper presents ProGraph, a novel benchmark for graph analysis by LLMs, and introduces LLM4Graph datasets with code augmentation to enhance LLM capabilities.

Findings

01

Current LLMs achieve only 36% accuracy on graph tasks.

02

Augmenting models with code and document retrieval improves accuracy by 11-32%.

03

Structured data analysis remains a challenging area for LLMs.

Abstract

The need to analyze graphs is ubiquitous across various fields, from social networks to biological research and recommendation systems. Therefore, enabling the ability of large language models (LLMs) to process graphs is an important step toward more advanced general intelligence. However, current LLM benchmarks on graph analysis require models to directly reason over the prompts describing graph topology, and are thus limited to small graphs with only a few dozens of nodes. In contrast, human experts typically write programs based on popular libraries for task solving, and can thus handle graphs with different scales. To this end, a question naturally arises: can LLMs analyze graphs like professionals? In this paper, we introduce ProGraph, a manually crafted benchmark containing 3 categories of graph tasks. The benchmark expects solutions based on programming instead of directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
lixin4sky/ProGraph
model· ♡ 1
♡ 1

Videos

Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models· slideslive

Taxonomy

TopicsTopic Modeling