AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
YaChen Yan, Liubo Li

TL;DR
AdaEnsemble introduces a dynamic, sparsely-gated mixture-of-experts architecture for CTR prediction that adaptively models feature interactions and optimizes prediction depth, improving accuracy and efficiency.
Contribution
This paper proposes AdaEnsemble, a novel adaptive ensemble architecture with dynamic routing and early exiting for improved CTR prediction performance.
Findings
Outperforms state-of-the-art models on real-world datasets.
Effectively models diverse feature interactions.
Enhances prediction accuracy and inference efficiency.
Abstract
Learning feature interactions is crucial to success for large-scale CTR prediction in recommender systems and Ads ranking. Researchers and practitioners extensively proposed various neural network architectures for searching and modeling feature interactions. However, we observe that different datasets favor different neural network architectures and feature interaction types, suggesting that different feature interaction learning methods may have their own unique advantages. Inspired by this observation, we propose AdaEnsemble: a Sparsely-Gated Mixture-of-Experts (SparseMoE) architecture that can leverage the strengths of heterogeneous feature interaction experts and adaptively learns the routing to a sparse combination of experts for each example, allowing us to build a dynamic hierarchy of the feature interactions of different types and orders. To further improve the prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Sentiment Analysis and Opinion Mining · Topic Modeling
MethodsEarly exiting using confidence measures
