A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second
Bla\v{z} \v{S}krlj, Benjamin Ben-Shalom, Grega Ga\v{s}per\v{s}i\v{c},, Adi Schwartz, Ramzi Hoseisi, Naama Ziporin, Davorin Kopi\v{c}, Andra\v{z}, Tori

TL;DR
This paper presents a highly optimized, open-source CPU-based implementation of Deep FFMs capable of over 300 million predictions per second, with significant bandwidth reduction and practical deployment at scale.
Contribution
It introduces novel optimizations, weight quantization techniques, and a scalable deployment of Deep FFMs on CPU-only infrastructure, which was previously unachieved at this scale.
Findings
Achieved over 300 million predictions per second on CPU-only systems.
Implemented weight quantization reducing bandwidth by over an order of magnitude.
Demonstrated successful deployment in multi-data-center environments.
Abstract
Field-aware Factorization Machines (FFMs) have emerged as a powerful model for click-through rate prediction, particularly excelling in capturing complex feature interactions. In this work, we present an in-depth analysis of our in-house, Rust-based Deep FFM implementation, and detail its deployment on a CPU-only, multi-data-center scale. We overview key optimizations devised for both training and inference, demonstrated by previously unpublished benchmark results in efficient model search and online training. Further, we detail an in-house weight quantization that resulted in more than an order of magnitude reduction in bandwidth footprint related to weight transfers across data-centres. We disclose the engine and associated techniques under an open-source license to contribute to the broader machine learning community. This paper showcases one of the first successful CPU-only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Magnetic Properties and Applications · Ferroelectric and Negative Capacitance Devices
