Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

Sunil Kumar Maurya; Xin Liu

arXiv:2605.01484·cs.LG·May 5, 2026

Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

Sunil Kumar Maurya, Xin Liu

PDF

TL;DR

This paper introduces EstGraph, a benchmark dataset and tasks for evaluating LLMs on large-scale graph property estimation using random walk sampling within context length constraints.

Contribution

It proposes a new benchmark and prompt strategies for assessing LLM reasoning on large, accessible graphs, addressing limitations of existing small-graph benchmarks.

Findings

01

LLMs can estimate properties of large graphs using random walk-based prompts.

02

The proposed prompts enable LLMs to handle graphs with millions of nodes.

03

The benchmark reveals strengths and limitations of LLM reasoning on large graph data.

Abstract

With the rapidly improving reasoning abilities of Large Language Models (LLMs), there is also a rising demand to use them in a wide variety of domains. This brings about the need to carefully evaluate the limits of the capabilities of these models with various tests and benchmarks. Graph structures are ubiquitous in real-world data, and are often used to represent and analyze relationship patterns within data. Many benchmarks have already been proposed in the graph literature to test the reasoning ability of LLMs to follow and execute graph algorithms. However, due to the limited context length of LLMs, these benchmarks consist of very small graphs. In real-world data, the size of graphs can be significantly larger, and in many cases, not fully accessible. In this paper, we examine a class of problems that arises with very large graphs having limited accessibility. We propose a large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.