Learning Unorthogonalized Matrices for Rotation Estimation
Kerui Gu, Zhihao Li, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang, Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao

TL;DR
This paper proposes learning unorthogonalized pseudo rotation matrices (PRoM) for 3D rotation estimation, demonstrating faster convergence and improved accuracy over traditional orthogonalization methods in pose estimation tasks.
Contribution
It introduces a novel approach to learn unorthogonalized matrices for rotation estimation, removing orthogonalization steps to enhance training efficiency and performance.
Findings
PRoM converges faster than traditional methods.
PRoM achieves state-of-the-art results on pose estimation benchmarks.
Removing orthogonalization improves training efficiency and solution quality.
Abstract
Estimating 3D rotations is a common procedure for 3D computer vision. The accuracy depends heavily on the rotation representation. One form of representation -- rotation matrices -- is popular due to its continuity, especially for pose estimation tasks. The learning process usually incorporates orthogonalization to ensure orthonormal matrices. Our work reveals, through gradient analysis, that common orthogonalization procedures based on the Gram-Schmidt process and singular value decomposition will slow down training efficiency. To this end, we advocate removing orthogonalization from the learning process and learning unorthogonalized `Pseudo' Rotation Matrices (PRoM). An optimization analysis shows that PRoM converges faster and to a better solution. By replacing the orthogonalization incorporated representation with our proposed PRoM in various rotation-related tasks, we achieve…
Peer Reviews
Decision·Submitted to ICLR 2024
The topic of the paper (rotation estimation in deep learning) has some importance in the field.
- To tell the truth, I was one of the reviewers of this paper in a previous venue. I see that the authors have removed some wrong derivations, but there are still many vague and wrong claims in the paper. - Most claims in the paper are either vague or not surprising (not showing what the authors intended to show). Most importantly, the proposed method (using a plain 3x3 matrix without any orthogonalization and instead guiding it to a rotation matrix using an additional loss, i.e., soft constrai
1. The work focus on the choice of rotation representation, the very basic but vital ingredient in rotation estimation task. This is in contrast to most work in 3D pose estimation, and has the potential of larger impact in the field. 2. The method is simple yet effective. The motivation for the method is put clear. Gradient update is affected by the extra step of orthogonalization. By simply removing it, the baseline method could be improved. 3. Evaluations tasks are very extensive. These incl
1. The evaluations consider two baselines (Zhou et al., 2019; Levinson et al., 2020). Both methods use 6D representations with differences in normalization. A comparison to more recent methods [1,2] should be more convincing. 2. The proposed method chose to base on CLIFF (Li et al., 2022). Since the proposed method is not a novel framework, it is generally applicable to any existing methods for pose estimation. Hence, I assume that PROM could be used together with baselines to also improve lea
S1. I think the paper does a good job analyzing the problems that orthogonalization methods introduce when used in the learning frameworks. S2. I find the experiments interesting because they show the improvements PRoM can bring to human pose estimation, cloud pose estimation, among other applications. S3. The paper is well written and is easy to follow and understand. Clarity thus is high and should be easy to replicate.
While I think the paper touches a very interesting topic (i.e., rotation estimation in deep learning frameworks) and shows interesting results, I have several concerns that make me a bit skeptical about the approach: 1. Lack of discussion about other possible ways to constrain orthonormality in learning problems. While the problems of orthogonalization in the learning problem may bring issues as described in the paper, there are other possible solutions that the paper does not discuss. For exam
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Human Pose and Action Recognition · Advanced Vision and Imaging
