Training Without Orthogonalization, Inference With SVD: A Gradient Analysis of Rotation Representations
Chris Choy

TL;DR
This paper provides a theoretical gradient analysis of SVD orthogonalization in rotation representations, showing that removing SVD during training avoids gradient distortion and supports 9D regression for better rotation estimation.
Contribution
It offers a detailed gradient analysis of SVD orthogonalization, explaining its impact on training and justifying 9D regression with SVD projection only at inference.
Findings
SVD backward pass Jacobian has rank 3 with specific singular values.
Gradient distortion is most severe when the predicted matrix is far from SO(3).
Removing SVD from training avoids gradient errors and improves rotation estimation.
Abstract
Recent work has shown that removing orthogonalization during training and applying it only at inference improves rotation estimation in deep learning, with empirical evidence favoring 9D representations with SVD projection. However, the theoretical understanding of why SVD orthogonalization specifically harms training, and why it should be preferred over Gram-Schmidt at inference, remains incomplete. We provide a detailed gradient analysis of SVD orthogonalization specialized to matrices and projection. Our central result derives the exact spectrum of the SVD backward pass Jacobian: it has rank (matching the dimension of ) with nonzero singular values and condition number , creating quantifiable gradient distortion that is most severe when the predicted matrix is far from (e.g., early in training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
