OrthoGrad Improves Neural Calibration

C. Evans Hedges

arXiv:2506.04487·cs.LG·September 29, 2025

OrthoGrad Improves Neural Calibration

C. Evans Hedges

PDF

TL;DR

OrthoGrad is a geometry-aware optimizer modification that constrains gradient directions to improve neural network calibration and uncertainty estimation without changing model architecture.

Contribution

We introduce OrthoGrad, a novel orthogonality-based optimization method that enhances calibration and uncertainty quantification in neural networks.

Findings

01

OrthoGrad matches SGD accuracy on CIFAR-10 with 10% labels.

02

It significantly improves test loss and confidence measures.

03

Theoretically, it constrains loss reduction pathways to prevent overconfidence.

Abstract

We study $⊥$ Grad, a geometry-aware modification to gradient-based optimization that constrains descent directions to address overconfidence, a key limitation of standard optimizers in uncertainty-critical applications. By enforcing orthogonality between gradient updates and weight vectors, $⊥$ Grad alters optimization trajectories without architectural changes. On CIFAR-10 with 10% labeled data, $⊥$ Grad matches SGD in accuracy while achieving statistically significant improvements in test loss ( $p = 0.05$ ), predictive entropy ( $p = 0.001$ ), and confidence measures. These effects show consistent trends across corruption levels and architectures. $⊥$ Grad is optimizer-agnostic, incurs minimal overhead, and remains compatible with post-hoc calibration techniques. Theoretically, we characterize convergence and stationary points for a simplified $⊥$ Grad variant, revealing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent · Softmax