TAOTF: A Two-stage Approximately Orthogonal Training Framework in Deep Neural Networks
Taoyong Cui, Jianze Li, Yuhan Dong, Li Liu

TL;DR
TAOTF is a two-stage training framework for deep neural networks that balances orthogonality and task performance, improving robustness to noisy data in image classification tasks.
Contribution
The paper introduces a novel two-stage orthogonal training framework with a polar decomposition-based initialization and soft orthogonal constraints, enhancing robustness and performance.
Findings
Achieves superior accuracy on natural and medical image datasets.
Provides stable training with improved robustness to noisy data.
Outperforms existing orthogonal constraint methods.
Abstract
The orthogonality constraints, including the hard and soft ones, have been used to normalize the weight matrices of Deep Neural Network (DNN) models, especially the Convolutional Neural Network (CNN) and Vision Transformer (ViT), to reduce model parameter redundancy and improve training stability. However, the robustness to noisy data of these models with constraints is not always satisfactory. In this work, we propose a novel two-stage approximately orthogonal training framework (TAOTF) to find a trade-off between the orthogonal solution space and the main task solution space to solve this problem in noisy data scenarios. In the first stage, we propose a novel algorithm called polar decomposition-based orthogonal initialization (PDOI) to find a good initialization for the orthogonal optimization. In the second stage, unlike other existing methods, we apply soft orthogonal constraints…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Medical Image Segmentation Techniques · Industrial Vision Systems and Defect Detection
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Linear Layer
