Binary Code based Hash Embedding for Web-scale Applications

Bencheng Yan; Pengjie Wang; Jinquan Liu; Wei Lin; Kuang-Chih Lee; Jian; Xu; Bo Zheng

arXiv:2109.02471·cs.IR·September 7, 2021

Binary Code based Hash Embedding for Web-scale Applications

Bencheng Yan, Pengjie Wang, Jinquan Liu, Wei Lin, Kuang-Chih Lee, Jian, Xu, Bo Zheng

PDF

Open Access

TL;DR

This paper introduces a binary code based hash embedding technique that significantly reduces memory usage in web-scale deep learning applications while maintaining high performance levels.

Contribution

The paper proposes a novel binary code based hash embedding method that drastically reduces embedding table size without substantial performance loss.

Findings

01

Achieves 99% of original performance with 1000x smaller embedding table

02

Reduces memory costs significantly for web-scale applications

03

Maintains effectiveness of embedding learning in large-scale systems

Abstract

Nowadays, deep learning models are widely adopted in web-scale applications such as recommender systems, and online advertising. In these applications, embedding learning of categorical features is crucial to the success of deep learning models. In these models, a standard method is that each categorical feature value is assigned a unique embedding vector which can be learned and optimized. Although this method can well capture the characteristics of the categorical features and promise good performance, it can incur a huge memory cost to store the embedding table, especially for those web-scale applications. Such a huge memory cost significantly holds back the effectiveness and usability of EDRMs. In this paper, we propose a binary code based hash embedding method which allows the size of the embedding table to be reduced in arbitrary scale without compromising too much performance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Caching and Content Delivery · Spam and Phishing Detection