LLMs Prompted for Graphs: Hallucinations and Generative Capabilities
Gurvan Richardeau, Samy Chali, Erwan Le Merrer, Camilla Penzo, Gilles, Tredan

TL;DR
This paper evaluates large language models' ability to recall and generate graphs, revealing their hallucination tendencies and emergent generative capabilities, and introduces metrics to assess their accuracy and reliability.
Contribution
It introduces a novel framework and metrics for assessing LLMs' graph recall and generation, highlighting their hallucination behaviors and potential emergent capabilities.
Findings
Hallucination amplitude correlates with LLM quality.
Most LLMs show good reproducibility in graph generation.
Graph hallucinations can serve as a benchmark for LLM evaluation.
Abstract
Large Language Models (LLMs) are nowadays prompted for a wide variety of tasks. In this article, we investigate their ability in reciting and generating graphs. We first study the ability of LLMs to regurgitate well known graphs from the literature (e.g. Karate club or the graph atlas)4. Secondly, we question the generative capabilities of LLMs by asking for Erdos-Renyi random graphs. As opposed to the possibility that they could memorize some Erdos-Renyi graphs included in their scraped training set, this second investigation aims at studying a possible emergent property of LLMs. For both tasks, we propose a metric to assess their errors with the lens of hallucination (i.e. incorrect information returned as facts). We most notably find that the amplitude of graph hallucinations can characterize the superiority of some LLMs. Indeed, for the recitation task, we observe that graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
