P2SGrad: Refined Gradients for Optimizing Deep Face Models
Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang,, Hongsheng Li

TL;DR
This paper introduces P2SGrad, a hyper-parameter free gradient method that improves deep face recognition training stability and performance by directly optimizing cosine similarity metrics.
Contribution
The paper proposes P2SGrad, a novel adaptive gradient method based on cosine similarity, unifying previous softmax losses and eliminating hyper-parameter tuning.
Findings
Achieves state-of-the-art results on LFW, MegaFace, and IJB-C.
Demonstrates stable and noise-robust training process.
Faster and more efficient training compared to traditional methods.
Abstract
Cosine-based softmax losses significantly improve the performance of deep face recognition networks. However, these losses always include sensitive hyper-parameters which can make training process unstable, and it is very tricky to set suitable hyper parameters for a specific dataset. This paper addresses this challenge by directly designing the gradients for adaptively training deep neural networks. We first investigate and unify previous cosine softmax losses by analyzing their gradients. This unified view inspires us to propose a novel gradient called P2SGrad (Probability-to-Similarity Gradient), which leverages a cosine similarity instead of classification probability to directly update the testing metrics for updating neural network parameters. P2SGrad is adaptive and hyper-parameter free, which makes the training process more efficient and faster. We evaluate our P2SGrad on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Face and Expression Recognition
MethodsSoftmax
