Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction
Jie Peng, Rui Wang, Qiang Wang, Zhewei Wei, Bin Tong, Guan Wang, Bo Zheng

TL;DR
This paper introduces a realistic evaluation framework, a new large-scale dataset, and an efficient model for predicting information cascade popularity in social networks, addressing key limitations of prior work.
Contribution
It proposes a time-ordered data splitting strategy, introduces the Taoke dataset with rich conversion signals, and develops CasTemp, a fast, effective cascade modeling framework.
Findings
CasTemp achieves state-of-the-art performance on four datasets.
CasTemp significantly speeds up training compared to complex graph-based methods.
The new dataset Taoke captures complete cascade lifecycle including conversions.
Abstract
Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) temporal leakage in current evaluation--random cascade-based splits allow models to access future information, yielding unrealistic results; (2) feature-poor datasets that lack downstream conversion signals (e.g., likes, comments, or purchases), which limits more practical applications; (3) computational inefficiency of complex graph-based methods that require days of training for marginal gains. We systematically address these challenges from three perspectives: task setup, dataset construction, and model design. First, we propose a time-ordered splitting strategy that chronologically partitions data into consecutive windows, ensuring models are evaluated on genuine forecasting tasks without future…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
