LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation

Lee Xiong; Zhirong Chen; Rahul Mayuranath; Shangran Qiu; Arda Ozdemir; Lu Li; Yang Hu; Dave Li; Jingtao Ren; Howard Cheng; Fabian Souto Herrera; Ahmed Agiza; Baruch Epshtein; Anuj Aggarwal; Julia Ulziisaikhan; Chao Wang; Dinesh Ramasamy; Parshva Doshi; Sri Reddy; Arnold Overwijk

arXiv:2601.20083·cs.IR·January 29, 2026

LLaTTE: Scaling Laws for Multi-Stage Sequence Modeling in Large-Scale Ads Recommendation

Lee Xiong, Zhirong Chen, Rahul Mayuranath, Shangran Qiu, Arda Ozdemir, Lu Li, Yang Hu, Dave Li, Jingtao Ren, Howard Cheng, Fabian Souto Herrera, Ahmed Agiza, Baruch Epshtein, Anuj Aggarwal, Julia Ulziisaikhan, Chao Wang, Dinesh Ramasamy, Parshva Doshi, Sri Reddy, Arnold Overwijk

PDF

Open Access

TL;DR

This paper introduces LLaTTE, a scalable transformer architecture for ads recommendation that leverages power-law scaling, semantic features, and a two-stage model to improve performance while maintaining low latency in industrial settings.

Contribution

It demonstrates that recommendation sequence modeling follows predictable scaling laws and introduces a two-stage architecture to effectively utilize large models under latency constraints.

Findings

01

Scaling laws apply to recommendation sequence modeling.

02

Semantic features are essential for effective scaling.

03

The multi-stage model achieves a 4.3% conversion uplift at Meta.

Abstract

We present LLaTTE (LLM-Style Latent Transformers for Temporal Events), a scalable transformer architecture for production ads recommendation. Through systematic experiments, we demonstrate that sequence modeling in recommendation systems follows predictable power-law scaling similar to LLMs. Crucially, we find that semantic features bend the scaling curve: they are a prerequisite for scaling, enabling the model to effectively utilize the capacity of deeper and longer architectures. To realize the benefits of continued scaling under strict latency constraints, we introduce a two-stage architecture that offloads the heavy computation of large, long-context models to an asynchronous upstream user model. We demonstrate that upstream improvements transfer predictably to downstream ranking tasks. Deployed as the largest user model at Meta, this multi-stage framework drives a 4.3\% conversion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Complex Network Analysis Techniques · Sentiment Analysis and Opinion Mining