Training with Multi-Layer Embeddings for Model Reduction
Benjamin Ghaemmaghami, Zihao Deng, Benjamin Cho, Leo Orshansky, Ashish, Kumar Singh, Mattan Erez, and Michael Orshansky

TL;DR
This paper proposes a multi-layer embedding training method that enhances recommendation model efficiency by reducing embedding size while maintaining accuracy, leveraging linear layer factorization and converting multi-layer solutions into single-layer models.
Contribution
Introduces a novel multi-layer embedding training architecture that improves embedding efficiency and reduces model size in recommendation systems, with theoretical analysis and practical implementation.
Findings
Achieves 4-8X reduction in embedding size without accuracy loss
Demonstrates effectiveness on CTR prediction benchmarks
Increases memory efficiency with 25% runtime overhead
Abstract
Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off. Our approach is fundamentally based on the ability of factorized linear layers to produce superior embeddings to that of a single linear layer. We focus on the analysis and implementation of a two-layer scheme. Harnessing the recent results in dynamics of backpropagation in linear neural networks, we explain the ability to get superior multi-layer embeddings via their tendency to have lower effective rank. We show that substantial advantages are obtained in the regime where the width of the hidden layer is much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Stochastic Gradient Optimization Techniques
