Scaling Distributed Training of Flood-Filling Networks on HPC   Infrastructure for Brain Mapping

Wushi Dong; Murat Keceli; Rafael Vescovi; Hanyu Li; Corey Adams; Elise; Jennings; Samuel Flender; Tom Uram; Venkatram Vishwanath; Nicola Ferrier,; Narayanan Kasthuri; Peter Littlewood

arXiv:1905.06236·cs.DC·December 11, 2019·1 cites

Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping

Wushi Dong, Murat Keceli, Rafael Vescovi, Hanyu Li, Corey Adams, Elise, Jennings, Samuel Flender, Tom Uram, Venkatram Vishwanath, Nicola Ferrier,, Narayanan Kasthuri, Peter Littlewood

PDF

Open Access 1 Repo

TL;DR

This paper presents a scalable distributed training approach for flood-filling networks used in brain mapping, significantly reducing training time while maintaining inference performance on high-performance computing infrastructure.

Contribution

It introduces a synchronous, data-parallel training method for FFNs using Horovod, enabling efficient scaling on supercomputers and providing insights into optimal training parameters.

Findings

01

Distributed training scaled to 2048 nodes on Theta supercomputer

02

Achieved similar inference performance with reduced training time

03

Identified optimal batch sizes and learning rates for FFN training

Abstract

Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod library, which is different from the asynchronous training scheme used in the published FFN code. We demonstrated that our distributed training scaled well up to 2048 Intel Knights Landing (KNL) nodes on the Theta supercomputer. Our trained models achieved similar level of inference performance, but took less training time compared to previous methods. Our study on the effects of different batch sizes on FFN training suggests ways to further improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wushidonguc/distributed_ffn
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques