TensorSocket: Shared Data Loading for Deep Learning Training
Ties Robroek (IT University of Copenhagen), Neil Kim Nielsen (IT University of Copenhagen), P{\i}nar T\"oz\"un (IT University of Copenhagen)

TL;DR
TensorSocket is a novel data loading framework that enables shared data loading across multiple deep learning training processes, significantly improving throughput and reducing costs by mitigating CPU bottlenecks and avoiding redundant computations.
Contribution
It introduces TensorSocket, a hardware- and pipeline-agnostic system that allows simultaneous training of different models with shared data loading, outperforming existing solutions in efficiency and ease of deployment.
Findings
Increases training throughput by up to 100%.
Achieves 50% cost savings on cloud instances.
Outperforms state-of-the-art shared data loading solutions.
Abstract
Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture search), among other things that yield the highest accuracy. The computational efficiency of these training tasks depends highly on how well the training data is supplied to the training process. The repetitive nature of these tasks results in the same data processing pipelines running over and over, exacerbating the need for and costs of computational resources. In this paper, we present TensorSocket to reduce the computational needs of deep learning training by enabling simultaneous training processes to share the same data loader. TensorSocket mitigates CPU-side bottlenecks in cases where the collocated training workloads have high throughput on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
MethodsSparse Evolutionary Training
