
TL;DR
OrthoGrad is a geometry-aware optimizer modification that constrains gradient directions to improve neural network calibration and uncertainty estimation without changing model architecture.
Contribution
We introduce OrthoGrad, a novel orthogonality-based optimization method that enhances calibration and uncertainty quantification in neural networks.
Findings
OrthoGrad matches SGD accuracy on CIFAR-10 with 10% labels.
It significantly improves test loss and confidence measures.
Theoretically, it constrains loss reduction pathways to prevent overconfidence.
Abstract
We study Grad, a geometry-aware modification to gradient-based optimization that constrains descent directions to address overconfidence, a key limitation of standard optimizers in uncertainty-critical applications. By enforcing orthogonality between gradient updates and weight vectors, Grad alters optimization trajectories without architectural changes. On CIFAR-10 with 10% labeled data, Grad matches SGD in accuracy while achieving statistically significant improvements in test loss (), predictive entropy (), and confidence measures. These effects show consistent trends across corruption levels and architectures. Grad is optimizer-agnostic, incurs minimal overhead, and remains compatible with post-hoc calibration techniques. Theoretically, we characterize convergence and stationary points for a simplified Grad variant, revealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent · Softmax
