Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wushi Dong, Murat Keceli, Rafael Vescovi, Hanyu Li, Corey Adams, Elise, Jennings, Samuel Flender, Tom Uram, Venkatram Vishwanath, Nicola Ferrier,, Narayanan Kasthuri, Peter Littlewood

TL;DR
This paper presents a scalable distributed training approach for flood-filling networks used in brain mapping, significantly reducing training time while maintaining inference performance on high-performance computing infrastructure.
Contribution
It introduces a synchronous, data-parallel training method for FFNs using Horovod, enabling efficient scaling on supercomputers and providing insights into optimal training parameters.
Findings
Distributed training scaled to 2048 nodes on Theta supercomputer
Achieved similar inference performance with reduced training time
Identified optimal batch sizes and learning rates for FFN training
Abstract
Mapping all the neurons in the brain requires automatic reconstruction of entire cells from volume electron microscopy data. The flood-filling network (FFN) architecture has demonstrated leading performance for segmenting structures from this data. However, the training of the network is computationally expensive. In order to reduce the training time, we implemented synchronous and data-parallel distributed training using the Horovod library, which is different from the asynchronous training scheme used in the published FFN code. We demonstrated that our distributed training scaled well up to 2048 Intel Knights Landing (KNL) nodes on the Theta supercomputer. Our trained models achieved similar level of inference performance, but took less training time compared to previous methods. Our study on the effects of different batch sizes on FFN training suggests ways to further improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Stochastic Gradient Optimization Techniques
