From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Bencheng Yan; Yuejie Lei; Zhiyuan Zeng; Di Wang; Kaiyi Lin; Pengjie Wang; Jian Xu; Bo Zheng

arXiv:2511.12081·cs.IR·November 18, 2025

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

Bencheng Yan, Yuejie Lei, Zhiyuan Zeng, Di Wang, Kaiyi Lin, Pengjie Wang, Jian Xu, Bo Zheng

PDF

Open Access

TL;DR

This paper introduces the Field-Aware Transformer (FAT), a novel model for CTR prediction that aligns model structure with data semantics, leading to better scaling, improved accuracy, and a formal understanding of model capacity.

Contribution

We propose FAT, a transformer variant embedding field-based priors, and establish a formal scaling law for CTR models based on Rademacher complexity.

Findings

01

FAT improves AUC by up to +0.51% on benchmarks.

02

FAT delivers +2.33% CTR and +0.66% RPM online.

03

Power-law scaling observed in model performance as width increases.

Abstract

Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns - a stark contrast to the smooth, predictable gains seen in large language models. We identify the root cause as a structural misalignment: Transformers assume sequential compositionality, while CTR data demand combinatorial reasoning over high-cardinality semantic fields. Unstructured attention spreads capacity indiscriminately, amplifying noise under extreme sparsity and breaking scalable learning. To restore alignment, we introduce the Field-Aware Transformer (FAT), which embeds field-based interaction priors into attention through decomposed content alignment and cross-field modulation. This design ensures model complexity scales with the number of fields F, not the total vocabulary size n >> F, leading to tighter generalization and, critically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Multimodal Machine Learning Applications