1$^{st}$ Place Solution of WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge
Junwei Xu, Zehao Zhao, Xiaoyu Hu, Zhenjie Song

TL;DR
This paper presents the winning solution for a multimodal CTR prediction challenge, leveraging sequential modeling and feature interaction to improve click-through rate predictions using multimodal embeddings.
Contribution
The paper introduces a simple yet effective method of integrating multimodal embeddings with user-item interaction modeling for CTR prediction.
Findings
Achieved 0.9839 AUC on the challenge dataset
Outperformed baseline models significantly
Demonstrated effectiveness of multimodal embedding integration
Abstract
The WWW 2025 EReL@MIR Workshop Multimodal CTR Prediction Challenge focuses on effectively applying multimodal embedding features to improve click-through rate (CTR) prediction in recommender systems. This technical report presents our 1 place winning solution for Task 2, combining sequential modeling and feature interaction learning to effectively capture user-item interactions. For multimodal information integration, we simply append the frozen multimodal embeddings to each item embedding. Experiments on the challenge dataset demonstrate the effectiveness of our method, achieving superior performance with a 0.9839 AUC on the leaderboard, much higher than the baseline model. Code and configuration are available in our GitHub repository and the checkpoint of our model can be found in HuggingFace.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Radiomics and Machine Learning in Medical Imaging · Natural Language Processing Techniques
