MDL-Pool: Adaptive Multilevel Graph Pooling Based on Minimum Description Length
Jan von Pichowski, Christopher Bl\"ocker, Ingo Scholtes

TL;DR
MDL-Pool introduces an adaptive graph pooling method based on the MDL principle, effectively modeling hierarchical interdependencies and selecting optimal pooling depths for improved graph classification.
Contribution
It proposes a novel MDL-based pooling operator that explicitly accounts for hierarchical interdependencies and adapts to varying graph sizes, unlike fixed-depth approaches.
Findings
Competitive performance on standard graph classification datasets
Effective modeling of hierarchical interdependencies
Adaptive pooling depth selection improves accuracy
Abstract
Graph pooling compresses graphs and summarises their topological properties and features in a vectorial representation. It is an essential part of deep graph representation learning and is indispensable in graph-level tasks like classification or regression. Current approaches pool hierarchical structures in graphs by iteratively applying shallow pooling operators up to a fixed depth. However, they disregard the interdependencies between structures at different hierarchical levels and do not adapt to datasets that contain graphs with different sizes that may require pooling with various depths. To address these issues, we propose MDL-Pool, a pooling operator based on the minimum description length (MDL) principle, whose loss formulation explicitly models the interdependencies between different hierarchical levels and facilitates a direct comparison between multiple pooling alternatives…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The method automatically determines the optimal pooling depth per graph instance, addressing a long-standing hyperparameter issue in hierarchical pooling. 2. The paper provides experiments on both synthetic and real-world datasets, including ablations on architecture variants and pooling depths.
Limited performance gain: In Tables 2 and 3, MDL-Pool does not consistently outperform baselines. For community detection, results are comparable or even worse than baselines on several datasets. Similarly, in graph classification, MDL-Pool’s average accuracy is not higher than several baselines, indicating limited empirical advantage. Insufficient justification of benefits: While the motivation is sound, the claimed benefits (interdependency modeling and adaptive depth) are not strongly suppor
The map equation is beneficial for observing the overall networks and for relevant clustering. The clustering helps for balanced training to fit the model in downstream graph analytics tasks. MDL is beneficial because it detects the depth of the input graph, which assists in effective hierarchical graph learning. The comprehensive result is better than other baselines
In the experiment, the authors did not mention the hyperparameter's impact on the model. The manuscript does not provide runtime details. Is minimum description length feasible on large volume datasets? The optimization of map equations involves nested matrix operations, which can result in a computationally heavy model. Please check the model's runtime with respect to simpler pooling operations like Top-kPool and SAGPool. In the case of community detection, the datasets are very sparse. Is t
- The integration of the MDL principle and map equation into deep graph pooling is well-motivated. It provides a principled way to address overfitting and model complexity while enhancing interpretability. - The proposed multilevel loss seamlessly integrates hierarchical information, overcoming optimization issues caused by layer-wise independence in stacked pooling. - The MDL framework naturally implements Occam’s razor, removing the need for hyperparameter tuning for cluster count or levels.
- The MDL-based loss focuses on topological structure and does not fully leverage node features in evaluating community quality, which might reduce performance on feature-dominant tasks. - Experiments show most graphs select only one or two pooling levels; it remains unclear whether MDL-Pool is beneficial in tasks with truly deep hierarchies. - The computation of multilevel flow matrices has quadratic cost in graph size, which may hinder scalability to very large graphs. No experiments on large-
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Graph Neural Networks · Natural Language Processing Techniques
MethodsMinimum Description Length
