DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning

Fan Li; Xiaoyang Wang; Wenjie Zhang; Ying Zhang; Xuemin Lin

arXiv:2508.12244·cs.LG·September 30, 2025

DHG-Bench: A Comprehensive Benchmark for Deep Hypergraph Learning

Fan Li, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin

PDF

Open Access 3 Reviews

TL;DR

DHG-Bench is the first comprehensive benchmark for deep hypergraph neural networks, systematically evaluating 17 algorithms across multiple dimensions and datasets to advance understanding and development in hypergraph learning.

Contribution

It introduces DHG-Bench, a unified benchmarking framework with extensive evaluation protocols, covering effectiveness, efficiency, robustness, and fairness of HNNs on diverse tasks and datasets.

Findings

01

Identified strengths and limitations of current HNN algorithms.

02

Provided insights into algorithm performance across various tasks.

03

Facilitated reproducible research with an open-source library.

Abstract

Deep graph models have achieved great success in network representation learning. However, their focus on pairwise relationships restricts their ability to learn pervasive higher-order interactions in real-world systems, which can be naturally modeled as hypergraphs. To tackle this issue, Hypergraph Neural Networks (HNNs) have garnered substantial attention in recent years. Despite the proposal of numerous HNNs, the absence of consistent experimental protocols and multi-dimensional empirical analysis impedes deeper understanding and further development of HNN research. While several toolkits for deep hypergraph learning (DHGL) have been introduced to facilitate algorithm evaluation, they provide only limited quantitative evaluation results and insufficient coverage of advanced algorithms, datasets, and benchmark tasks. To fill the gap, we introduce DHG-Bench, the first comprehensive…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- Hypergraph models suffer from inadequate evaluation, which slows down the advancement of the field. Moreover, most existing setups contain only node-level classification tasks. This paper represents a significant step forward in understanding the limitations of current models and provides a consistent, uniform framework for evaluating new models in a fair way. - The paper points out a couple of limitations exhibited by current models, which represent good areas for future research. In particul

Weaknesses

- The node classification setup used in the paper appears similar to that of ED-HNN. However, the reported results are noticeably lower. Is there any major difference in the training setup? - I agree that structural robustness is an important metric. However, for models that explicitly take connectivity into account, not observing a drop in performance at a 90% perturbation ratio seems more like a negative result than a positive one. The paper presents robust performance across different perturb

Reviewer 02Rating 4Confidence 4

Strengths

1. **Comprehensive benchmark coverage.** DHG-Bench implements 17 HNN models and evaluates them on 22 datasets spanning node-, hyperedge-, and graph-level tasks. 2. **Multi-dimensional evaluation.** The benchmark goes beyond accuracy to assess efficiency, robustness, and fairness, providing a more holistic view of model behavior. 3. **Reproducibility focus.** The open-source code and datasets enable other researchers to replicate results and extend the benchmark. 4. **Insightful findings.** The e

Weaknesses

1. **Limited conceptual novelty.** The benchmark aggregates existing models but does not introduce new methodologies or theoretical advances. 2. **Insufficient graph-based baselines.** Only two GNNs are included, and simple but strong baselines (e.g., direction-aware GNNs) are missing, making it difficult to quantify the advantage of hypergraphs. 3. **Directed hypergraphs ignored.** The benchmark only evaluates undirected hypergraphs, omitting variants that model asymmetric or causal relations,

Reviewer 03Rating 6Confidence 4

Strengths

S1. The authors provided a timely benchmark for hypergraph neural networks. S2. The benchmark is comprehensive, in terms of both HNNs and downstream tasks. S3. Most of the benchmark hypergraph datasets have been covered.

Weaknesses

I do not have major criticisms of this work, but I have several suggestions. **W1. [pip installation]** For now, I think one needs to download the GitHub repo to run the code. I think authors can improve the code to be easier to use, as in PyG (https://pytorch-geometric.readthedocs.io/en/2.4.0/install/installation.html). **W2. [Label split]** While many HNNs use a 50/25/25 split for node classification, I personally think this ratio contains too many training nodes, compared to the common grap

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Machine Learning and Data Classification · Machine Learning in Healthcare