On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems
Wonje Choi, Karthi Duraisamy, Ryan Gary Kim, Janardhan Rao Doppa,, Partha Pratim Pande, Diana Marculescu, Radu Marculescu

TL;DR
This paper introduces a hybrid on-chip network architecture combining wireline and wireless links to enhance communication efficiency in CPU-GPU systems for CNN training, resulting in faster, more energy-efficient deep learning model training.
Contribution
It proposes a novel hybrid NoC design tailored for CNN training workloads on heterogeneous manycore systems, significantly reducing latency and energy consumption.
Findings
1.8x reduction in network latency
2.2x increase in network throughput
25% savings in energy-delay-product
Abstract
Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
