Frequent subgraph-based persistent homology for graph classification
Xinyang Chen, Ama\"el Broustet, Guanyuan Zeng, Cheng He, Guoting Chen

TL;DR
This paper introduces a novel frequent subgraph filtration method for persistent homology, enhancing graph classification by capturing richer topological features and integrating with neural networks for improved accuracy.
Contribution
The work proposes the Frequent Subgraph Filtration (FSF) for persistent homology, providing theoretical validation and novel graph classification frameworks combining topological features with machine learning and GNNs.
Findings
FPH-ML achieves competitive accuracy with existing methods.
Integrating FPH into GNNs improves performance by up to 21%.
Theoretical properties of FSF are validated experimentally.
Abstract
Persistent homology (PH) has recently emerged as a powerful tool for extracting topological features. Integrating PH into machine learning and deep learning models enhances topology awareness and interpretability. However, most PH methods on graphs rely on a limited set of filtrations, such as degree-based or weight-based filtrations, which overlook richer features like recurring information across the dataset and thus restrict expressive power. In this work, we propose a novel graph filtration called Frequent Subgraph Filtration (FSF), which is derived from frequent subgraphs and produces stable and information-rich frequency-based persistent homology (FPH) features. We study the theoretical properties of FSF and provide both proofs and experimental validation. Beyond persistent homology itself, we introduce two approaches for graph classification: an FPH-based machine learning model…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper proposes a new filtration paradigm for persistent homology beyond local structural heuristics. 2. The proposed FSF, as demonstrated by the paper, captures recurring, dataset-level motifs, addressing a known limitation of conventional graph filtrations. 3. The paper provides theoretical results establishing bounded PH dimensionality, monotonicity, and graph isomorphism invariance.
1. I think Figure 1 needs refinement. (1) In Figure 1, please articulate which motifs FSF tends to admit as t increases and why they are meaningful. For instance, Vietoris–Rips filtrations emphasize distance-based proximity; analogously, FSF should make explicit whether it prioritizes cycles, cliques, or other recurring patterns, and how these relate to downstream discriminative power. (2) To facilitate visual comparison, keep node layouts and orderings consistent across the three subpanels in
-The method demonstrates how to integrate information about frequent subgraph patterns into persistence homology. -It incorporates frequent subgraph information with the machine learning-based and graph neural network models to improve networks' topological information for downstream graph analytics tasks. -The idea of injecting global tokens looks beneficial for graph learning. -In a comprehensive experiment, the models perform better over other baseline methods.
-No runtime (time complexity) or wall-clock time has been provided for the entire system. -In the manuscript, the authors did not explicitly explain -- How does it trade off with the redundancy of frequent patterns in networks? - The models lack generality. It seems the model mostly performs on the biomedical domain datasets. Are they efficient on large-scale graphs, even on social network datasets like COLLAB, REDDIT-BINARY, and REDDIT-MULTI? IMDB-BINARY, IMDB-MULTI) - Already 1-parameter /
What is appreciated by the reviewer is the rigorous treatment of the theoretical matter in the paper, showing great care for detail. Moreover, the results shown are promising.
The reviewer is not an expert on Frequent Subgraph counting (and likely neither are most readers) and from this perspective the paper is a bit hard to follow at times. Providing a heuristic explanation of the aims and implications of the theory and method is common in machine learning papers. In contrast, in pure mathematics, brevity is usually encouraged. Finding this balance can be challenging, but allowing the paper to have a bit more focus on the former style would most certainly lead to g
1. The integration of frequent subgraph mining with persistent homology is a creative and underexplored idea. Most previous TDA-based graph classifiers rely on simplicial complexes or clique filtrations, but this paper innovatively constructs filtrations around discrete subgraph patterns, capturing mid-scale structures not accessible to node- or clique-based PH. 2. The method is theoretically grounded: it provides a clear mapping from mined substructures to topological spaces, defines filtration
1. **Scalability Concerns:** The combination of frequent subgraph mining and persistent homology is computationally expensive. The paper should include an empirical runtime study and potential heuristics to limit the search space (i.e., maximum subgraph size, support thresholds). 2. **Limited Theoretical Novelty in PH Component:** The topological theory used (PH stability, persistence diagram vectorization) is largely standard. The novelty lies in the integration rather than new homological theo
1. Conceptual novelty The idea of deriving filtrations from frequent subgraph mining instead of numeric thresholds is innovative and bridges graph mining with topological data analysis. 2. Global-topology perspective By performing FSM over the whole dataset, the method captures cross-graph structural stability rather than per-graph local features, leading to better interpretability. 3. Robustness and stability The FSF-based persistence is shown to be resilient to edge perturbations
1. Partial topology coverage Because only frequent subgraph patterns are used, the generated simplicial complexes do not fully represent the original graphs. The authors should report the coverage ratio between the constructed complexes and the original graphs. 2. Dependency on FSM quality The method’s success heavily relies on the chosen frequent subgraph mining algorithm and its support threshold ( \sigma ), which could vary drastically across datasets. 3. Scalabili
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopological and Geometric Data Analysis · Advanced Graph Neural Networks · Bioinformatics and Genomic Networks
