Secure Distributed Training at Scale

Eduard Gorbunov; Alexander Borzunov; Michael Diskin; Max Ryabinin

arXiv:2106.11257·cs.LG·January 3, 2023

Secure Distributed Training at Scale

Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin

PDF

Open Access 3 Repos

TL;DR

This paper introduces a communication-efficient, secure decentralized training protocol that is resilient to Byzantine failures, enabling collaborative large-scale deep learning without sacrificing efficiency.

Contribution

We propose a novel Byzantine-tolerant decentralized training protocol that significantly improves communication efficiency for large-scale deep learning.

Findings

01

Achieves Byzantine tolerance in decentralized training.

02

Reduces communication overhead compared to existing methods.

03

Enables scalable secure collaborative model training.

Abstract

Many areas of deep learning benefit from using increasingly larger neural networks trained on public data, as is the case for pre-trained models for NLP and computer vision. Training such models requires a lot of computational resources (e.g., HPC clusters) that are not available to small research groups and independent researchers. One way to address it is for several smaller groups to pool their computational resources together and train a model that benefits all participants. Unfortunately, in this case, any participant can jeopardize the entire training run by sending incorrect updates, deliberately or by mistake. Training in presence of such peers requires specialized distributed training algorithms with Byzantine tolerance. These algorithms often sacrifice efficiency by introducing redundant communication or passing all updates through a trusted server, making it infeasible to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques