NeuZip: Memory-Efficient Training and Inference with Dynamic Compression   of Neural Networks

Yongchang Hao; Yanshuai Cao; Lili Mou

arXiv:2410.20650·cs.LG·October 29, 2024

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Yongchang Hao, Yanshuai Cao, Lili Mou

PDF

Open Access 1 Repo

TL;DR

NeuZip introduces a novel weight compression scheme that significantly reduces memory usage during training and inference of neural networks without performance loss, enabling more efficient deployment on memory-constrained devices.

Contribution

NeuZip presents a new entropy-based weight compression method that maintains model performance while drastically reducing memory requirements during training and inference.

Findings

01

Reduced training memory for Llama-3 8B from 31GB to under 16GB

02

Halved memory usage during inference with near-lossless performance

03

Maintained training dynamics despite compression

Abstract

The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

borealisai/neuzip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications