An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale

Adri\'an Javaloy; Antonio Vergari

arXiv:2602.14656·cs.LG·February 17, 2026

An Embarrassingly Simple Way to Optimize Orthogonal Matrices at Scale

Adri\'an Javaloy, Antonio Vergari

PDF

Open Access

TL;DR

This paper introduces POGO, a fast, GPU-friendly optimizer for orthogonal matrices that maintains orthogonality effectively, enabling scalable optimization in machine learning with thousands of constraints.

Contribution

It improves upon the Landing algorithm by allowing modern adaptive optimizers, reducing hyperparameters, and maintaining orthogonality efficiently at scale.

Findings

01

POGO outperforms recent optimizers on challenging benchmarks.

02

It can optimize thousands of orthogonal matrices in minutes.

03

The algorithm is fast, GPU-friendly, and maintains orthogonality at all times.

Abstract

Orthogonality constraints are ubiquitous in robust and probabilistic machine learning. Unfortunately, current optimizers are computationally expensive and do not scale to problems with hundreds or thousands of constraints. One notable exception is the Landing algorithm (Ablin et al., 2024) which, however comes at the expense of temporarily relaxing orthogonality. In this work, we revisit and improve on the ideas behind Landing, enabling the inclusion of modern adaptive optimizers while ensuring that orthogonal constraints are effectively met. Remarkably, these improvements come at little to no cost, and reduce the number of required hyperparemeters. Our algorithm POGO is fast and GPU-friendly, consisting of only 5 matrix products, and in practice maintains orthogonality at all times. On several challenging benchmarks, POGO greatly outperforms recent optimizers and shows it can optimize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Machine Learning and Data Classification