DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning
Fan Li, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin

TL;DR
DHG-Bench is the first comprehensive benchmark for deep hypergraph neural networks, systematically evaluating 17 algorithms across multiple dimensions and datasets to advance understanding and development in hypergraph learning.
Contribution
It introduces DHG-Bench, a unified benchmarking framework with extensive evaluation protocols, covering effectiveness, efficiency, robustness, and fairness of HNNs on diverse tasks and datasets.
Findings
Identified strengths and limitations of current HNN algorithms.
Provided insights into algorithm performance across various tasks.
Facilitated reproducible research with an open-source library.
Abstract
Deep graph models have achieved great success in network representation learning. However, their focus on pairwise relationships restricts their ability to learn pervasive higher-order interactions in real-world systems, which can be naturally modeled as hypergraphs. To tackle this issue, Hypergraph Neural Networks (HNNs) have garnered substantial attention in recent years. Despite the proposal of numerous HNNs, the absence of consistent experimental protocols and multi-dimensional empirical analysis impedes deeper understanding and further development of HNN research. While several toolkits for deep hypergraph learning (DHGL) have been introduced to facilitate algorithm evaluation, they provide only limited quantitative evaluation results and insufficient coverage of advanced algorithms, datasets, and benchmark tasks. To fill the gap, we introduce DHG-Bench, the first comprehensive…
Peer Reviews
Decision·ICLR 2026 Poster
- Hypergraph models suffer from inadequate evaluation, which slows down the advancement of the field. Moreover, most existing setups contain only node-level classification tasks. This paper represents a significant step forward in understanding the limitations of current models and provides a consistent, uniform framework for evaluating new models in a fair way. - The paper points out a couple of limitations exhibited by current models, which represent good areas for future research. In particul
- The node classification setup used in the paper appears similar to that of ED-HNN. However, the reported results are noticeably lower. Is there any major difference in the training setup? - I agree that structural robustness is an important metric. However, for models that explicitly take connectivity into account, not observing a drop in performance at a 90% perturbation ratio seems more like a negative result than a positive one. The paper presents robust performance across different perturb
1. **Comprehensive benchmark coverage.** DHG-Bench implements 17 HNN models and evaluates them on 22 datasets spanning node-, hyperedge-, and graph-level tasks. 2. **Multi-dimensional evaluation.** The benchmark goes beyond accuracy to assess efficiency, robustness, and fairness, providing a more holistic view of model behavior. 3. **Reproducibility focus.** The open-source code and datasets enable other researchers to replicate results and extend the benchmark. 4. **Insightful findings.** The e
1. **Limited conceptual novelty.** The benchmark aggregates existing models but does not introduce new methodologies or theoretical advances. 2. **Insufficient graph-based baselines.** Only two GNNs are included, and simple but strong baselines (e.g., direction-aware GNNs) are missing, making it difficult to quantify the advantage of hypergraphs. 3. **Directed hypergraphs ignored.** The benchmark only evaluates undirected hypergraphs, omitting variants that model asymmetric or causal relations,
S1. The authors provided a timely benchmark for hypergraph neural networks. S2. The benchmark is comprehensive, in terms of both HNNs and downstream tasks. S3. Most of the benchmark hypergraph datasets have been covered.
I do not have major criticisms of this work, but I have several suggestions. **W1. [pip installation]** For now, I think one needs to download the GitHub repo to run the code. I think authors can improve the code to be easier to use, as in PyG (https://pytorch-geometric.readthedocs.io/en/2.4.0/install/installation.html). **W2. [Label split]** While many HNNs use a 50/25/25 split for node classification, I personally think this ratio contains too many training nodes, compared to the common grap
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Data Classification · Machine Learning in Healthcare
