Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Yifan Feng, Chengwu Yang, Xingliang Hou, Shaoyi Du, Shihui Ying,, Zongze Wu, Yue Gao

TL;DR
This paper introduces a comprehensive benchmark and novel techniques to evaluate and enhance large language models' ability to understand and reason about hypergraphs, which model complex high-order relationships.
Contribution
The work presents the first extensive hypergraph benchmark for LLMs, along with new prompting methods Hyper-BAG and Hyper-COT that improve high-order reasoning performance.
Findings
Benchmark with 21,500 problems across various hypergraph tasks.
Evaluation of six prominent LLMs demonstrating model strengths and weaknesses.
Performance improvements of up to 9% on structure classification tasks.
Abstract
Existing benchmarks like NLGraph and GraphQA evaluate LLMs on graphs by focusing mainly on pairwise relationships, overlooking the high-order correlations found in real-world data. Hypergraphs, which can model complex beyond-pairwise relationships, offer a more robust framework but are still underexplored in the context of LLMs. To address this gap, we introduce LLM4Hypergraph, the first comprehensive benchmark comprising 21,500 problems across eight low-order, five high-order, and two isomorphism tasks, utilizing both synthetic and real-world hypergraphs from citation networks and protein structures. We evaluate six prominent LLMs, including GPT-4o, demonstrating our benchmark's effectiveness in identifying model strengths and weaknesses. Our specialized prompting framework incorporates seven hypergraph languages and introduces two novel techniques, Hyper-BAG and Hyper-COT, which…
Peer Reviews
Decision·ICLR 2025 Poster
Originality: The paper proposes a new benchmark and prompting techniques tailored for hypergraphs, addressing a gap in the assessment of LLMs' capabilities. Quality: The benchmark is comprehensive, covering a wide range of tasks and hypergraph types, which strengthens the validity of the findings. Clarity: The paper is well-organized, with clear explanations of the hypergraph languages and prompting frameworks. Significance: The work is significant as it pushes the boundaries of LLMs' understand
The paper could benefit from a deeper analysis of the limitations of the current LLMs in handling hypergraphs, beyond performance metrics. While the benchmark is comprehensive, it may lack diversity in terms of the types of real-world hypergraphs used, which could affect the generalizability of the findings.
The paper is easy to read and the experiments are comprehensive and thorough.
**Main arguments**: 1. The paper adapts existing benchmarks and prompting techniques for hypergraphs. While the results offer some insights into the extent to which LLMs understand hypergraphs, they largely mirror findings for simple graphs---specifically, that CoT and BAG can enhance LLM performance. The only notable point is that using suitable language to describe hypergraphs can aid LLM comprehension, which is novel but trivial. Given that the proposed techniques are naive adaptations of exi
- This paper proposes the first benchmark for evaluating LLMs on hypergraphs. - The authors thoroughly address questions about hypergraphs. - The problems are well-structured and clearly categorized according to their objectives. - The code is released for reproducibility.
- The motivations for this research are not sufficiently discussed. Why is it important to enable LLMs to understand hypergraph structures? Are there potential practical use cases? Are there any motivations beyond the fact that similar research has been done with graphs? - The datasets used in the study are not comprehensive. To be specific: - The definition of "hypergraph size" is unclear. Is it referring to the number of nodes, the number of hyperedges, or the sum of hyperedge sizes? - The
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
