Update Compression for Deep Neural Networks on the Edge
Bo Chen, Ali Bakhshi, Gustavo Batista, Brian Ng, Tat-Jun Chin

TL;DR
This paper introduces a matrix factorisation-based method to efficiently compress updates for deep neural networks on edge devices, reducing bandwidth while maintaining accuracy.
Contribution
It proposes a novel approach focusing on compressing model updates rather than entire models, outperforming similar federated learning techniques in update size reduction.
Findings
Requires less than half the update size of existing methods
Maintains comparable accuracy with reduced transmission
Effective for edge deployment scenarios
Abstract
An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the server side based on a copy of the DNN model, with only the necessary data transmitted to the edge to update the deployed model. However, due to bandwidth constraints, we want to minimise the transmission required to achieve the update. We develop a simple approach based on matrix factorisation to compress the model update -- this differs from compressing the model itself. The key idea is to preserve existing knowledge in the current model and optimise only small additional parameters for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
