An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify
Shivam Verma, Vivian Chen, Darren Mei

TL;DR
This paper presents CAMoE, a multi-task learning framework that improves ad click-through rate prediction across audio, video, and display formats on Spotify by leveraging modality-aware techniques and deep feature interactions.
Contribution
The paper introduces CAMoE, a novel multi-task learning framework that effectively integrates multi-modal ad data for improved CTR prediction in streaming platforms.
Findings
Achieved a 14.5% increase in CTR for audio ads.
Improved multi-modal ad performance with near Pareto-optimal results.
Delivered significant CTR gains and cost reductions in large-scale deployment.
Abstract
Spotify, a large-scale multimedia platform, attracts over 675 million monthly active users who collectively consume millions of hours of music, podcasts, audiobooks, and video content. This diverse content consumption pattern introduces unique challenges for computational advertising, which must effectively integrate a variety of ad modalities, including audio, video, and display, within a single user experience. Traditional ad recommendation models, primarily designed for foregrounded experiences, often struggle to reconcile the platform's inherent audio-centrality with the demands of optimizing ad performance across multiple formats and modalities. To overcome these challenges, we introduce Cross-modal Adaptive Mixture-of-Experts (CAMoE), a novel framework for optimizing click-through rate (CTR) prediction in both audio-centric and multi-modal settings. CAMoE enhances traditional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Softmax · Layer Normalization · Dropout · BERT · Dense Connections · Vision Transformer · CAMoE
