Theoretical Guarantees for LT-TTD: A Unified Transformer-based Architecture for Two-Level Ranking Systems
Ayoub Abraich

TL;DR
This paper introduces LT-TTD, a unified transformer-based architecture for two-level ranking systems that theoretically improves over traditional decoupled models by reducing error propagation and enhancing ranking quality.
Contribution
It proposes a novel unified architecture combining retrieval and ranking in a transformer framework with formal guarantees and introduces a new evaluation metric UPQE.
Findings
LT-TTD reduces irretrievable relevant items by a factor depending on distillation strength.
The multi-objective optimization achieves a better global optimum than disjoint training.
The approach maintains practical computational complexity.
Abstract
Modern recommendation and search systems typically employ multi-stage ranking architectures to efficiently handle billions of candidates. The conventional approach uses distinct L1 (candidate retrieval) and L2 (re-ranking) models with different optimization objectives, introducing critical limitations including irreversible error propagation and suboptimal ranking. This paper identifies and analyzes the fundamental limitations of this decoupled paradigm and proposes LT-TTD (Listwise Transformer with Two-Tower Distillation), a novel unified architecture that bridges retrieval and ranking phases. Our approach combines the computational efficiency of two-tower models with the expressivity of transformers in a unified listwise learning framework. We provide a comprehensive theoretical analysis of our architecture and establish formal guarantees regarding error propagation mitigation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Recommender Systems and Techniques · Expert finding and Q&A systems
MethodsLinear Layer · Multi-Head Attention · Dense Connections · Adam · Attention Is All You Need · Dropout · Knowledge Distillation · Layer Normalization · Position-Wise Feed-Forward Layer · Byte Pair Encoding
