Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

Dali Wang; Yunyao Zhang; Junqing Yu; Yi-Ping Phoebe Chen; Chen Xu; Zikai Song

arXiv:2604.20311·cs.MM·April 24, 2026

Seeing Further and Wider: Joint Spatio-Temporal Enlargement for Micro-Video Popularity Prediction

Dali Wang, Yunyao Zhang, Junqing Yu, Yi-Ping Phoebe Chen, Chen Xu, Zikai Song

PDF

TL;DR

This paper introduces a unified spatio-temporal enlargement framework for micro-video popularity prediction, enhancing content understanding and scalability by combining long-sequence perception with a hierarchical memory bank.

Contribution

It proposes a novel joint spatio-temporal enlargement approach with a Temporal Enlargement module and a Topology-Aware Memory Bank for scalable, long-range content modeling.

Findings

01

Outperforms 11 strong baselines on three MVPP benchmarks.

02

Achieves significant improvements in prediction accuracy.

03

Enhances ranking consistency across datasets.

Abstract

Micro-video popularity prediction (MVPP) aims to forecast the future popularity of videos on online media, which is essential for applications such as content recommendation and traffic allocation. In real-world scenarios, it is critical for MVPP approaches to understand both the temporal dynamics of a given video (temporal) and its historical relevance to other videos (spatial). However, existing approaches sufer from limitations in both dimensions: temporally, they rely on sparse short-range sampling that restricts content perception; spatially, they depend on flat retrieval memory with limited capacity and low efficiency, hindering scalable knowledge utilization. To overcome these limitations, we propose a unified framework that achieves joint spatio-temporal enlargement, enabling precise perception of extremely long video sequences while supporting a scalable memory bank that can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.