Parameter Database : Data-centric Synchronization for Scalable Machine Learning
Naman Goel, Divyakant Agrawal, Sanjay Chawla, Ahmed Elmagarmid

TL;DR
This paper introduces a data-centric synchronization framework for distributed machine learning that leverages the iterative nature of algorithms to improve throughput while maintaining correctness, outperforming traditional bulk synchronization methods.
Contribution
It presents a novel data-centric synchronization approach that relaxes BSP constraints, enabling more efficient distributed ML without sacrificing correctness.
Findings
Substantial performance improvements over BSP
Maintains sequential correctness in distributed ML tasks
Effective use of stale updates to increase throughput
Abstract
We propose a new data-centric synchronization framework for carrying out of machine learning (ML) tasks in a distributed environment. Our framework exploits the iterative nature of ML algorithms and relaxes the application agnostic bulk synchronization parallel (BSP) paradigm that has previously been used for distributed machine learning. Data-centric synchronization complements function-centric synchronization based on using stale updates to increase the throughput of distributed ML computations. Experiments to validate our framework suggest that we can attain substantial improvement over BSP while guaranteeing sequential correctness of ML tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices
