Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures
Joseph Suarez, Clare Zhu

TL;DR
This paper introduces a dynamic batching method that significantly accelerates various neural network architectures, achieving over 10x speedups generally and up to 1000x in specific cases like mixture of experts layers.
Contribution
The authors propose a simple, effective dynamic batching approach with theoretical performance bounds applicable to a wide range of architectures.
Findings
Achieved over 10x speedup on general dynamic architectures.
Attained up to 1000x speedup on sparsely gated mixture of experts.
Provided theoretical bounds for performance with known and unknown architectures.
Abstract
We present a simple dynamic batching approach applicable to a large class of dynamic architectures that consistently yields speedups of over 10x. We provide performance bounds when the architecture is not known a priori and a stronger bound in the special case where the architecture is a predetermined balanced tree. We evaluate our approach on Johnson et al.'s recent visual question answering (VQA) result of his CLEVR dataset by Inferring and Executing Programs (IEP). We also evaluate on sparsely gated mixture of experts layers and achieve speedups of up to 1000x over the naive implementation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
