One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning
Chaosheng Dong, Xiaojie Jin, Weihao Gao, Yijia Wang, Hongyi Zhang,, Xiang Wu, Jianchao Yang, Xiaobing Liu

TL;DR
This paper introduces a novel subsampling method for large-scale deep learning that leverages information from inference passes to improve data selection, enhancing training efficiency and effectiveness.
Contribution
It proposes a new framework and algorithm that utilize forward pass information during inference to better select training data, addressing the challenge of large-scale streaming data.
Findings
Improved data sampling leads to better model training efficiency.
The method outperforms traditional ad-hoc sampling baselines.
Effective on large-scale classification and regression tasks.
Abstract
Deep learning models in large-scale machine learning systems are often continuously trained with enormous data from production environments. The sheer volume of streaming training data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice. Our key insight is that these deployed ML systems continuously perform forward passes on data instances during inference, but ad-hoc sampling does not take advantage of this substantial computational effort. Therefore, we propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes. A novel optimization framework is proposed to analyze this problem and we provide an efficient approximation algorithm under the framework of Mini-batch gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques
