Profiling and optimization of multi-card GPU machine learning jobs
Marcin Lawenda, Kyrylo Khloponin, Krzesimir Samborski, {\L}ukasz Szustak

TL;DR
This paper analyzes various optimization techniques for multi-GPU machine learning jobs, focusing on performance, hardware configurations, and the impact of different tuning strategies on large models.
Contribution
It provides a comprehensive analysis of parallelization and optimization strategies for multi-GPU machine learning, including experimental evaluation on NVIDIA H100 hardware.
Findings
Parallelization strategies significantly affect training efficiency.
Optimization techniques impact VRAM utilization and memory transfers.
Task nature influences iteration time and resource usage.
Abstract
The effectiveness and efficiency of machine learning methodologies are crucial, especially with respect to the quality of results and computational cost. This paper discusses different model optimization techniques, providing a comprehensive analysis of key performance indicators. Several parallelization strategies for image recognition, adapted to different hardware and software configurations, including distributed data parallelism and distributed hardware processing, are analyzed. Selected optimization strategies are studied in detail, highlighting the related challenges and advantages of their implementation. Furthermore, the impact of different performance improvement techniques (DPO, LoRA, QLoRA, and QAT) on the tuning process of large language models is investigated. Experimental results illustrate how the nature of the task affects the iteration time in a multiprocessor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Embedded Systems Design Techniques
