Accelerating Training of Deep Neural Networks via Sparse Edge Processing
Sourya Dey, Yinan Shao, Keith M. Chugg, Peter A. Beerel

TL;DR
This paper introduces a reconfigurable hardware architecture for deep neural networks that leverages structured sparsity, edge-processing, and parallelization to drastically reduce training time and memory usage while maintaining high inference fidelity.
Contribution
The paper presents a novel hardware architecture that enables online training and inference of DNNs with significant efficiency gains through structured sparsity and edge-processing techniques.
Findings
Network complexity reduced by up to 30x.
Training time decreased by up to 35x compared to GPUs.
Architecture adapts automatically to different network sizes.
Abstract
We propose a reconfigurable hardware architecture for deep neural networks (DNNs) capable of online training and inference, which uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements. This novel architecture introduces the notion of edge-processing to provide flexibility and combines junction pipelining and operational parallelization to speed up training. The overall effect is to reduce network complexity by factors up to 30x and training time by up to 35x relative to GPUs, while maintaining high fidelity of inference results. This has the potential to enable extensive parameter searches and development of the largely unexplored theoretical foundation of DNNs. The architecture automatically adapts itself to different network sizes given available hardware resources. As proof of concept, we show results obtained for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
