Enabling Large Batch Size Training for DNN Models Beyond the Memory Limit While Maintaining Performance
XinYu Piao, DoangJoo Synn, JooYoung Park, Jong-Kook Kim

TL;DR
This paper introduces Micro-Batch Processing (MBP), a technique that enables training deep neural networks with larger batch sizes than memory constraints allow, by splitting batches and normalizing loss to maintain performance.
Contribution
The paper presents a novel method called MBP that allows training with large batch sizes beyond system memory limits without additional hardware or memory increase.
Findings
MBP enables training with larger batch sizes than memory capacity.
Loss normalization maintains model performance during micro-batch processing.
Method works without additional hardware or memory upgrades.
Abstract
Recent deep learning models are difficult to train using a large batch size, because commodity machines may not have enough memory to accommodate both the model and a large data batch size. The batch size is one of the hyper-parameters used in the training model, and it is dependent on and is limited by the target machine memory capacity because the batch size can only fit into the remaining memory after the model is uploaded. Moreover, the data item size is also an important factor because if each data item size is larger then the batch size that can fit into the remaining memory becomes smaller. This paper proposes a method called Micro-Batch Processing (MBP) to address this problem. This method helps deep learning models to train by providing a batch processing method that splits a batch into a size that can fit in the remaining memory and processes them sequentially. After…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
