TL;DR
DeInfoReg introduces a decoupled learning framework that shortens gradient flows, enhances training throughput through parallelization, and outperforms traditional methods in efficiency and noise resistance across various tasks.
Contribution
It presents a novel decoupled supervised learning approach with information regularization, enabling efficient parallel training and improved performance over standard backpropagation.
Findings
Significantly improves training throughput on multiple GPUs.
Achieves better noise resistance compared to traditional BP models.
Demonstrates superior performance across diverse datasets.
Abstract
This paper introduces Decoupled Supervised Learning with Information Regularization (DeInfoReg), a novel approach that transforms a long gradient flow into multiple shorter ones, thereby mitigating the vanishing gradient problem. Integrating a pipeline strategy, DeInfoReg enables model parallelization across multiple GPUs, significantly improving training throughput. We compare our proposed method with standard backpropagation and other gradient flow decomposition techniques. Extensive experiments on diverse tasks and datasets demonstrate that DeInfoReg achieves superior performance and better noise resistance than traditional BP models and efficiently utilizes parallel computing resources. The code for reproducibility is available at: https://github.com/ianzih/Decoupled-Supervised-Learning-for-Information-Regularization/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
