ACE-Sync: An Adaptive Cloud-Edge Synchronization Framework for Communication-Efficient Large-Scale Distributed Model Training
Yi Yang, Ziyu Lin, Liesheng Wei

TL;DR
ACE-Sync is an adaptive framework that reduces communication costs in large-scale distributed training by dynamically selecting parameter synchronization strategies, achieving significant bandwidth savings while maintaining high model accuracy.
Contribution
It introduces a novel adaptive synchronization framework combining gradient importance prediction, differentiated compression, and hierarchical coordination for efficient distributed training.
Findings
Reduced communication from 112.5 GB to 44.7 GB (60% reduction).
Shortened convergence epochs from 41 to 39.
Maintained high accuracy with only 0.3% drop from baseline.
Abstract
Large-scale deep learning models impose substantial communication overh ead in distributed training, particularly in bandwidth-constrained or heterogeneous clo ud-edge environments. Conventional synchronous or fixed-compression techniques o ften struggle to balance communication cost, convergence stability, and model accura cy. To address these challenges, we propose ACE-Sync, an Adaptive Cloud-Edge Sy nchronization Framework that integrates (1) an attention-based gradient importance p redictor, (2) a differentiated parameter compression strategy, and (3) a hierarchical cl oud-edge coordination mechanism. ACE-Sync dynamically selects which parameter groups to synchronize and determines appropriate compression levels under per-devic e bandwidth budgets. A knapsack-based optimization strategy is adopted to maximize important gradient preservation while reducing redundant communication.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
