AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon

TL;DR
AutoAssist is a framework that accelerates deep neural network training by filtering out less informative instances using a lightweight assistant network, reducing training time significantly while maintaining accuracy.
Contribution
The paper introduces AutoAssist, a novel instance filtering framework with an assistant network to speed up deep neural network training, outperforming traditional importance sampling methods.
Findings
Reduces training epochs by 40% for ResNet.
Saves 30% training time for transformer models.
Maintains comparable accuracy and BLEU scores.
Abstract
Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount of improvement in the current model by a stochastic gradient update on each instance varies dynamically. In AutoAssist, we utilize this fact and design a simple instance shrinking operation, which is used to filter out instances with relatively low marginal improvement to the current model; thus the computationally intensive gradient computations are performed on informative instances as much as possible. We prove that the proposed technique outperforms vanilla SGD with existing importance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Average Pooling · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization
