Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction
Ao Zhou, Mingsheng Tu, Luping Wang, Tenghao Sun, Zifeng Cheng, Yafeng Yin, Zhiwei Jiang, Qing Gu

TL;DR
This paper introduces a novel multimodal framework for social media popularity prediction that enhances visual-textual alignment and captures hierarchical content patterns, achieving state-of-the-art results.
Contribution
It proposes hierarchical prototypes, contrastive learning, and dual-grained prompt learning to improve multimodal social media analysis.
Findings
Achieves state-of-the-art performance on benchmark datasets.
Effectively models cross-content correlations and hierarchical patterns.
Enhances visual-textual alignment through contrastive learning.
Abstract
Social Media Popularity Prediction is a complex multimodal task that requires effective integration of images, text, and structured information. However, current approaches suffer from inadequate visual-textual alignment and fail to capture the inherent cross-content correlations and hierarchical patterns in social media data. To overcome these limitations, we establish a multi-class framework , introducing hierarchical prototypes for structural enhancement and contrastive learning for improved vision-text alignment. Furthermore, we propose a feature-enhanced framework integrating dual-grained prompt learning and cross-modal attention mechanisms, achieving precise multimodal representation through fine-grained category modeling. Experimental results demonstrate state-of-the-art performance on benchmark metrics, establishing new reference standards for multimodal social media analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
