AutoShard: Automated Embedding Table Sharding for Recommender Systems
Daochen Zha, Louis Feng, Bhargav Bhushanam, Dhruv Choudhary, Jade Nie,, Yuandong Tian, Jay Chae, Yinbin Ma, Arun Kejariwal, Xia Hu

TL;DR
AutoShard is a novel system that uses neural cost modeling and deep reinforcement learning to automatically and efficiently shard large embedding tables in recommender systems, improving balance and performance.
Contribution
AutoShard introduces a neural cost model combined with reinforcement learning to automate and optimize embedding table sharding, addressing NP-hard partitioning challenges.
Findings
AutoShard outperforms heuristic methods on large-scale datasets.
The learned policy transfers across different sharding scenarios.
AutoShard shards hundreds of tables in seconds, suitable for production.
Abstract
Embedding learning is an important technique in deep recommendation models to map categorical features to dense vectors. However, the embedding tables often demand an extremely large number of parameters, which become the storage and efficiency bottlenecks. Distributed training solutions have been adopted to partition the embedding tables into multiple devices. However, the embedding tables can easily lead to imbalances if not carefully partitioned. This is a significant design challenge of distributed systems named embedding table sharding, i.e., how we should partition the embedding tables to balance the costs across devices, which is a non-trivial task because 1) it is hard to efficiently and precisely measure the cost, and 2) the partition problem is known to be NP-hard. In this work, we introduce our novel practice in Meta, namely AutoShard, which uses a neural cost model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Human Mobility and Location-Based Analysis · Video Surveillance and Tracking Methods
