Speeding up Model Loading with fastsafetensors

Takeshi Yoshimura; Tatsuhiro Chiba; Manish Sethi; Daniel Waddington; Swaminathan Sundararaman

arXiv:2505.23072·cs.DC·May 30, 2025

Speeding up Model Loading with fastsafetensors

Takeshi Yoshimura, Tatsuhiro Chiba, Manish Sethi, Daniel Waddington, Swaminathan Sundararaman

PDF

Open Access 1 Repo

TL;DR

This paper introduces fastsafetensors, a Python library that significantly accelerates loading large pre-trained models by optimizing tensor deserialization and data transfer processes, achieving up to 7.5x speedup.

Contribution

The work presents a novel deserialization approach for safetensors files that improves model loading speed through direct device memory instantiation and low-level I/O optimizations.

Findings

01

Achieves 4.8x to 7.5x faster model loading times

02

Effective for models with up to 176 billion parameters

03

Improves I/O and tensor preprocessing efficiency

Abstract

The rapid increases in model parameter sizes introduces new challenges in pre-trained model loading. Currently, machine learning code often deserializes each parameter as a tensor object in host memory before copying it to device memory. We found that this approach underutilized storage throughput and significantly slowed down loading large models with a widely-used model file formats, safetensors. In this work, we present fastsafetensors, a Python library designed to optimize the deserialization of tensors in safetensors files. Our approach first copies groups of on-disk parameters to device memory, where they are directly instantiated as tensor objects. This design enables further optimization in low-level I/O and high-level tensor preprocessing, including parallelized copying, peer-to-peer DMA, and GPU offloading. Experimental results show performance improvements of 4.8x to 7.5x in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

foundation-model-stack/fastsafetensors
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Parallel Computing and Optimization Techniques