Learning hidden cascades via classification
Derrick Gilchrist Edward Manoharan, Anubha Goel, Alexandros Iosifidis, Henri Hansen, Juho Kanniainen

TL;DR
This paper introduces a machine learning framework called Distribution Classification that infers social spreading dynamics from observable intermediate indicators, outperforming existing methods in accuracy and scalability.
Contribution
The paper presents a novel partial observability-aware learning method that effectively estimates diffusion parameters using observable intermediate indicators, applicable to large networks.
Findings
Outperforms Approximate Bayesian Computation and GNN baselines
Accurately estimates transmission parameters across diverse settings
Successfully applied to real-world insider trading network
Abstract
The spreading dynamics in social networks are often studied under the assumption that individuals' statuses, whether informed or infected, are fully observable. However, in many real-world situations, such statuses remain unobservable, which is crucial for determining an individual's potential to further spread the infection. While final statuses are hidden, intermediate indicators such as symptoms of infection are observable and provide useful representations of the underlying diffusion process. We propose a partial observability-aware Machine Learning framework to learn the characteristics of the spreading model. We term the method Distribution Classification, which utilizes the power of classifiers to infer the underlying transmission dynamics. Through extensive benchmarking against Approximate Bayesian Computation and GNN-based baselines, our framework consistently outperforms these…
Peer Reviews
Decision·Submitted to ICLR 2026
S1. The paper offers a rigorous and practically meaningful formulation of the HC problem, extending cascade inference to partially observable and noisy scenarios. The motivation, particularly in epidemiological and financial contexts, is compelling. S2. The key innovation lies in reframing parameter recovery as a distributional classification problem, enabling flexible and fine-grained inference through entity-specific classifiers rather than global aggregation. S3. The experimental evaluation
W1. The paper lacks theoretical or empirical analysis of identifiability for p and q. Without conditions ensuring uniqueness or consistency, the interpretability of the inferred parameters remains uncertain. W2. Sections 3.1–3.2 could be more rigorous and transparent. Key details such as negative sampling, class imbalance handling, and aggregation of classifier outputs are under-specified, and notation inconsistencies (e.g., q, b_1, b_2) may hinder reproducibility. W3. Baseline coverage is lim
NA
- Very simple model for which many parameter identification approaches could be applied. The authors dismiss models like Markov Random Fields with inference via loopy belief propagation, or Bayesian methods such as Gibbs sampling or EM, claiming they do not scale. I suspect that the proposed method does not scale either, while offering no guarantees against oscillations or divergence. Numerous modern methods based on probabilistic neural networks could equally be applied in this setting. Positi
1. The authors introduce a novel problem, termed the Hidden Cascade problem, in which infection states are latent and only indirect symptoms are observable. 2. The proposed two-sided observation model extends the conventional one-sided assumption. 3. The Distribution Classification (DC) framework reformulates MLE-based inference as a classification task that minimizes average classification accuracy, inspired by adversarial distribution matching. 4. The proposed approach is model-agnostic.
1. The figures (e.g., Figure 1) could be more intuitive and visually informative. 2. This manuscript lacks an ablation study to examine how different feature vectors affect inference performance. 3. The proposed HC problem is conceptually similar to micro-level prediction in diffusion modeling, where the goal is to infer the infection status of each node in a network. 4. More competitive baselines, such as point-process-based models, should be included for comparison. 5. Since the authors cl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · COVID-19 epidemiological studies
