BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman Hosny, Marina Neseem, Sherief Reda

TL;DR
BitTrain introduces a novel bitmap compression technique exploiting activation sparsity to significantly reduce memory usage during training on edge devices, enabling more efficient and scalable edge AI.
Contribution
The paper proposes BitTrain, a new method that compresses activation memory during training using bitmap compression, improving memory efficiency without sacrificing accuracy.
Findings
Up to 34% reduction in memory footprint at 50% sparsity.
Over 70% sparsity achieved with further pruning, reducing memory by up to 56%.
Seamless integration with modern deep learning frameworks.
Abstract
Training on the Edge enables neural networks to learn continuously from new data after deployment on memory-constrained edge devices. Previous work is mostly concerned with reducing the number of model parameters which is only beneficial for inference. However, memory footprint from activations is the main bottleneck for training on the edge. Existing incremental training methods fine-tune the last few layers sacrificing accuracy gains from re-training the whole model. In this work, we investigate the memory footprint of training deep learning models, and use our observations to propose BitTrain. In BitTrain, we exploit activation sparsity and propose a novel bitmap compression technique that reduces the memory footprint during training. We save the activations in our proposed bitmap compression format during the forward pass of the training, and restore them during the backward pass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsPruning
