Occam Gradient Descent

B.N. Kausik

arXiv:2405.20194·cs.LG·September 4, 2025

Occam Gradient Descent

B.N. Kausik

PDF

Open Access 1 Repo

TL;DR

This paper introduces Occam Gradient Descent, an algorithm that combines training and pruning to improve efficiency and accuracy in deep learning models, outperforming traditional methods across various tasks.

Contribution

The paper proposes a provably good algorithm that integrates training and pruning, specifically combining gradient descent with magnitude pruning into Occam Gradient Descent.

Findings

01

Outperforms traditional gradient descent on image classification benchmarks.

02

Achieves better results than Random Forests on tabular data tasks.

03

Improves performance of natural language transformers over standard training.

Abstract

Deep learning neural network models must be large enough to adapt to their problem domain, while small enough to avoid overfitting training data during gradient descent. To balance these competing demands, over-provisioned deep learning models such as transformers are trained for a single epoch on large data sets, and hence inefficient with both computing resources and training data. In response to these inefficiencies, we derive a provably good algorithm that can combine any training and pruning methods to simultaneously optimize efficiency and accuracy, identifying conditions that resist overfitting and reduce model size while outperforming the underlying training algorithm. We then use the algorithm to combine gradient descent with magnitude pruning into "Occam Gradient Descent." With respect to loss, compute and model size (a) on image classification benchmarks, linear and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bnkausik/occam_gradient_descent
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

MethodsPruning