R2F: A Remote Retraining Framework for AIoT Processors with Computing Errors
Dawen Xu, Meng He, Cheng Liu, Ying Wang, Long Cheng, Huawei Li,, Xiaowei Li, Kwang-Ting Cheng

TL;DR
This paper introduces R2F, a remote retraining framework for AIoT processors with soft errors, improving model resilience and accuracy while optimizing data transmission and retraining efficiency.
Contribution
The paper proposes a novel remote retraining framework (R2F) for AIoT processors with errors, including an optimized partial TMR strategy and a sparse increment compression method.
Findings
Top-5 accuracy improved by up to 13.73%
Retraining time reduced by 38%-88% with minimal accuracy loss
R2F balances accuracy and performance penalties effectively
Abstract
AIoT processors fabricated with newer technology nodes suffer rising soft errors due to the shrinking transistor sizes and lower power supply. Soft errors on the AIoT processors particularly the deep learning accelerators (DLAs) with massive computing may cause substantial computing errors. These computing errors are difficult to be captured by the conventional training on general purposed processors like CPUs and GPUs in a server. Applying the offline trained neural network models to the edge accelerators with errors directly may lead to considerable prediction accuracy loss. To address the problem, we propose a remote retraining framework (R2F) for remote AIoT processors with computing errors. It takes the remote AIoT processor with soft errors in the training loop such that the on-site computing errors can be learned with the application data on the server and the retrained models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Machine Learning and ELM
