Training with Multi-Layer Embeddings for Model Reduction

Benjamin Ghaemmaghami; Zihao Deng; Benjamin Cho; Leo Orshansky; Ashish; Kumar Singh; Mattan Erez; and Michael Orshansky

arXiv:2006.05623·cs.LG·June 11, 2020·5 cites

Training with Multi-Layer Embeddings for Model Reduction

Benjamin Ghaemmaghami, Zihao Deng, Benjamin Cho, Leo Orshansky, Ashish, Kumar Singh, Mattan Erez, and Michael Orshansky

PDF

Open Access

TL;DR

This paper proposes a multi-layer embedding training method that enhances recommendation model efficiency by reducing embedding size while maintaining accuracy, leveraging linear layer factorization and converting multi-layer solutions into single-layer models.

Contribution

Introduces a novel multi-layer embedding training architecture that improves embedding efficiency and reduces model size in recommendation systems, with theoretical analysis and practical implementation.

Findings

01

Achieves 4-8X reduction in embedding size without accuracy loss

02

Demonstrates effectiveness on CTR prediction benchmarks

03

Increases memory efficiency with 25% runtime overhead

Abstract

Modern recommendation systems rely on real-valued embeddings of categorical features. Increasing the dimension of embedding vectors improves model accuracy but comes at a high cost to model size. We introduce a multi-layer embedding training (MLET) architecture that trains embeddings via a sequence of linear layers to derive superior embedding accuracy vs. model size trade-off. Our approach is fundamentally based on the ability of factorized linear layers to produce superior embeddings to that of a single linear layer. We focus on the analysis and implementation of a two-layer scheme. Harnessing the recent results in dynamics of backpropagation in linear neural networks, we explain the ability to get superior multi-layer embeddings via their tendency to have lower effective rank. We show that substantial advantages are obtained in the regime where the width of the hidden layer is much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Graph Neural Networks · Stochastic Gradient Optimization Techniques