TL;DR
GraphInstruct introduces a multi-level, comprehensive benchmark for diagnosing and improving LLMs' capabilities in graph generation across various complexities and dimensions.
Contribution
It provides a novel stratified benchmark with extensive instructions and solutions, enabling detailed diagnosis and enhancement of LLM graph synthesis abilities.
Findings
Discriminative power peaks at multi-constraint composition.
No single prompting strategy dominates across models.
Domain constraints are iteration-invariant, suggesting retrieval as the next frontier.
Abstract
Graph-structured data underpins applications from citation analysis and social-network modeling to molecular design and knowledge-graph construction, and Large Language Models (LLMs) are increasingly used as prompt-driven graph synthesizers. Classical graph-generation reviews catalog deep generative models and their evaluation primitives, but predate the LLM era and provide no foundation for evaluating instruction-following graph synthesis. Recent LLM-era benchmarks evaluate models along graph-type or task-domain axes; such organizations, however, average over structural complexity and cannot localize where in the complexity spectrum an LLM breaks down. To close this diagnostic gap, we introduce GraphInstruct, a progressive-complexity benchmark that stratifies LLM graph generation into six complexity levels and five evaluation dimensions, paired with 800 hand-authored instructions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
