Profiling based Out-of-core Hybrid Method for Large Neural Networks
Yuki Ito, Haruki Imai, Tung Le Duc, Yasushi Negishi, Kiyokuni, Kawachiya, Ryo Matsumiya, Toshio Endo

TL;DR
This paper introduces PoocH, a profiling-based hybrid method that intelligently manages data swapping and recomputation to enable training large neural networks exceeding GPU memory limits with reduced performance overhead.
Contribution
PoocH dynamically selects layers for swapping or recomputing based on runtime profiling, improving large neural network training efficiency on limited GPU memory.
Findings
Successfully trained a 50 GB memory neural network on a 16 GB GPU
Achieved 38% performance degradation on x86 and 28% on POWER9 compared to in-core training
Extended Chainer framework to implement and evaluate PoocH
Abstract
GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping method and recomputing method have been proposed in existing work. However, in these methods, performance overhead occurs due to data movement or increase of computation. In order to reduce the overhead, it is important to consider characteristics of each layer such as sizes and cost for recomputation. Based on this direction, we proposed Profiling based out-of-core Hybrid method (PoocH). PoocH determines target layers of swapping or recomputing based on runtime profiling. We implemented PoocH by extending a deep learning framework, Chainer, and we evaluated its performance. With PoocH, we successfully computed an NN requiring 50…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
