Enhancing CTR Prediction with De-correlated Expert Networks
Jiancheng Wang, Mingjia Yin, Hao Wang, Enhong Chen

TL;DR
This paper introduces a De-Correlated Mixture-of-Experts framework for CTR prediction, employing a novel loss and metric to reduce expert correlation, leading to improved advertising performance.
Contribution
It proposes a new D-MoE framework with a Cross-Expert De-Correlation loss and metric, demonstrating that reducing expert correlation enhances CTR prediction accuracy.
Findings
D-MoE achieves a 1.19% GMV lift in online A/B tests.
De-correlation strategies are mutually compatible and improve performance.
Extensive experiments validate the effectiveness of the proposed methods.
Abstract
Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert architectures, to differentiate the experts, which we refer to expert de-correlation. However, it remains unclear whether these strategies effectively achieve de-correlated experts. To address this, we propose a De-Correlated MoE (D-MoE) framework, which introduces a Cross-Expert De-Correlation loss to minimize expert correlations.Additionally, we propose a novel metric, termed Cross-Expert Correlation, to quantitatively evaluate the expert de-correlation degree. Based on this metric, we identify a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
