Redox: Improving I/O Efficiency of Model Training Through File Redirection

Yuhao Li; Xuanhua Shi; Yunfei Zhao; Yongluan Zhou; Yusheng Hua; Xuehai Qian

arXiv:2505.16280·cs.DC·December 9, 2025

Redox: Improving I/O Efficiency of Model Training Through File Redirection

Yuhao Li, Xuanhua Shi, Yunfei Zhao, Yongluan Zhou, Yusheng Hua, Xuehai Qian

PDF

Open Access 1 Repo

TL;DR

Redox is a system that enhances I/O efficiency in model training by leveraging file redirection, enabling batch reads and prefetching, which significantly accelerates training times.

Contribution

Redox introduces a novel file redirection technique and a batch read protocol to improve I/O efficiency in distributed model training.

Findings

01

Achieves up to 4.57x faster training compared to PyTorch.

02

Redox's file redirection has minimal impact on training randomness.

03

Efficient local and distributed read protocols reduce wasted data reads.

Abstract

This paper proposes Redox, a training data management system designed to achieve high I/O efficiency. The key insight is a new observation of file redirection: for model training, when training data in one file is requested, the system has the flexibility to return the data of another file. Based on this property, Redox starts with a bold design principle that chunks of data files are always read from disk in batch, and once loaded, all files in the chunk will be consumed without being loaded again. We propose efficient local and distributed file read protocol based on this principle that both minimizes the wasted data read and enables opportunistic prefetch from remote node. Moreover, we analyze file redirection's impact on randomness, and show that it has little effects on training efficiency. Experimental results indicate that Redox significantly accelerates data fetching in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brand-official/brand
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Cloud Data Security Solutions