
TL;DR
This paper introduces Occam Gradient Descent, an algorithm that combines training and pruning to improve efficiency and accuracy in deep learning models, outperforming traditional methods across various tasks.
Contribution
The paper proposes a provably good algorithm that integrates training and pruning, specifically combining gradient descent with magnitude pruning into Occam Gradient Descent.
Findings
Outperforms traditional gradient descent on image classification benchmarks.
Achieves better results than Random Forests on tabular data tasks.
Improves performance of natural language transformers over standard training.
Abstract
Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, over-provisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we derive a provably good algorithm that can combine any training and pruning methods to simultaneously optimize efficiency and accuracy, identifying conditions that resist overfitting and reduce model size while outperforming the underlying training algorithm. We then use the algorithm to combine gradient descent with magnitude pruning into "Occam Gradient Descent." With respect to loss, compute and model size (a) on image classification benchmarks, linear and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
MethodsPruning
