ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional   Networks on Multi-Core and Many-Core Shared Memory Machines

Aleksandar Zlateski; Kisuk Lee; H. Sebastian Seung

arXiv:1510.06706·cs.NE·June 21, 2016

ZNN - A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-Core and Many-Core Shared Memory Machines

Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung

PDF

2 Repos

TL;DR

ZNN introduces a parallel algorithm for training 3D convolutional networks that achieves near-linear speedup on multi-core and many-core shared memory machines, making ConvNet training faster and more scalable.

Contribution

The paper presents a novel task-based parallel algorithm for ConvNet training that attains near-linear speedup on shared-memory architectures, with an efficient implementation called ZNN.

Findings

01

ZNN achieves roughly linear speedup with the number of CPU cores.

02

Over 90x speedup on a many-core Xeon Phi CPU.

03

Performance varies with network architecture and kernel sizes.

Abstract

Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent's theorem to the task dependency graph implies that linear speedup with the number of processors is attainable within the PRAM model of parallel computation, for wide network architectures. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. We implement the algorithm with a publicly available software package called ZNN.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution