From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction
Bencheng Yan, Yuejie Lei, Zhiyuan Zeng, Di Wang, Kaiyi Lin, Pengjie Wang, Jian Xu, Bo Zheng

TL;DR
This paper introduces the Field-Aware Transformer (FAT), a novel model for CTR prediction that aligns model structure with data semantics, leading to better scaling, improved accuracy, and a formal understanding of model capacity.
Contribution
We propose FAT, a transformer variant embedding field-based priors, and establish a formal scaling law for CTR models based on Rademacher complexity.
Findings
FAT improves AUC by up to +0.51% on benchmarks.
FAT delivers +2.33% CTR and +0.66% RPM online.
Power-law scaling observed in model performance as width increases.
Abstract
Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n >> F, leading to tighter generalization and, critically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications
