A Bi-layered Parallel Training Architecture for Large-scale Convolutional Neural Networks
Jianguo Chen, Kenli Li, Kashif Bilal, Xu Zhou, Keqin Li, and Philip S., Yu

TL;DR
This paper introduces a bi-layered parallel training architecture for large-scale CNNs that enhances training efficiency in distributed environments through outer and inner-layer parallelism, workload balancing, and asynchronous updates.
Contribution
The paper proposes a novel bi-layered parallel training architecture with strategies for data partitioning, asynchronous weight updates, and task scheduling to accelerate CNN training in distributed systems.
Findings
Significant reduction in training time for large-scale CNNs.
Maintained high accuracy despite parallelization.
Effective workload balancing and synchronization strategies.
Abstract
Benefitting from large-scale training datasets and the complex training network, Convolutional Neural Networks (CNNs) are widely applied in various fields with high accuracy. However, the training process of CNNs is very time-consuming, where large amounts of training samples and iterative operations are required to obtain high-quality weight parameters. In this paper, we focus on the time-consuming training process of large-scale CNNs and propose a Bi-layered Parallel Training (BPT-CNN) architecture in distributed computing environments. BPT-CNN consists of two main components: (a) an outer-layer parallel training for multiple CNN subnetworks on separate data subsets, and (b) an inner-layer parallel training for each subnetwork. In the outer-layer parallelism, we address critical issues of distributed and parallel computing, including data communication, synchronization, and workload…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
