Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction

Ao Zhou; Mingsheng Tu; Luping Wang; Tenghao Sun; Zifeng Cheng; Yafeng Yin; Zhiwei Jiang; Qing Gu

arXiv:2508.16147·cs.IR·August 25, 2025

Cross-Modal Prototype Augmentation and Dual-Grained Prompt Learning for Social Media Popularity Prediction

Ao Zhou, Mingsheng Tu, Luping Wang, Tenghao Sun, Zifeng Cheng, Yafeng Yin, Zhiwei Jiang, Qing Gu

PDF

TL;DR

This paper introduces a novel multimodal framework for social media popularity prediction that enhances visual-textual alignment and captures hierarchical content patterns, achieving state-of-the-art results.

Contribution

It proposes hierarchical prototypes, contrastive learning, and dual-grained prompt learning to improve multimodal social media analysis.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Effectively models cross-content correlations and hierarchical patterns.

03

Enhances visual-textual alignment through contrastive learning.

Abstract

Social Media Popularity Prediction is a complex multimodal task that requires effective integration of images, text, and structured information. However, current approaches suffer from inadequate visual-textual alignment and fail to capture the inherent cross-content correlations and hierarchical patterns in social media data. To overcome these limitations, we establish a multi-class framework , introducing hierarchical prototypes for structural enhancement and contrastive learning for improved vision-text alignment. Furthermore, we propose a feature-enhanced framework integrating dual-grained prompt learning and cross-modal attention mechanisms, achieving precise multimodal representation through fine-grained category modeling. Experimental results demonstrate state-of-the-art performance on benchmark metrics, establishing new reference standards for multimodal social media analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.