SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates
Baixi Sun, Xiaodong Yu, Chengming Zhang, Jiannan Tian, Sian Jin, Kamil, Iskra, Tao Zhou, Tekin Bicer, Pete Beckman, and Dingwen Tao

TL;DR
SOLAR is a highly optimized data loading framework designed to significantly accelerate the training of CNN-based scientific surrogates on large datasets, addressing the data loading bottleneck in high-performance computing environments.
Contribution
This work introduces SOLAR, a novel data loader with optimized data access patterns and workload balancing, tailored for large-scale scientific surrogate training.
Findings
Achieves up to 24.4X speedup over PyTorch Data Loader
Provides 3.52X speedup over existing state-of-the-art data loaders
Effectively handles terabyte-scale scientific datasets
Abstract
CNN-based surrogates have become prevalent in scientific applications to replace conventional time-consuming physical approaches. Although these surrogates can yield satisfactory results with significantly lower computation costs over small training datasets, our benchmarking results show that data-loading overhead becomes the major performance bottleneck when training surrogates with large datasets. In practice, surrogates are usually trained with high-resolution scientific data, which can easily reach the terabyte scale. Several state-of-the-art data loaders are proposed to improve the loading throughput in general CNN training; however, they are sub-optimal when applied to the surrogate training. In this work, we propose SOLAR, a surrogate data loader, that can ultimately increase loading throughput during the training. It leverages our three key observations during the benchmarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Parallel Computing and Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
