Softmax Dissection: Towards Understanding Intra- and Inter-class Objective for Embedding Learning
Lanqing He, Zhongdao Wang, Yali Li, Shengjin Wang

TL;DR
This paper introduces D-Softmax, a dissection of the softmax loss into separate intra- and inter-class objectives, enabling better tuning and faster training in embedding learning tasks like face recognition.
Contribution
The paper proposes D-Softmax, which disentangles intra- and inter-class objectives in softmax loss, and introduces sampling variants to reduce computation and accelerate training.
Findings
D-Softmax performs comparably to SphereFace and ArcFace on face verification.
Sampling variants of D-Softmax significantly speed up training (up to 64x).
Fast variants maintain high performance with minor accuracy loss.
Abstract
The softmax loss and its variants are widely used as objectives for embedding learning, especially in applications like face recognition. However, the intra- and inter-class objectives in the softmax loss are entangled, therefore a well-optimized inter-class objective leads to relaxation on the intra-class objective, and vice versa. In this paper, we propose to dissect the softmax loss into independent intra- and inter-class objective (D-Softmax). With D-Softmax as objective, we can have a clear understanding of both the intra- and inter-class objective, therefore it is straightforward to tune each part to the best state. Furthermore, we find the computation of the inter-class objective is redundant and propose two sampling-based variants of D-Softmax to reduce the computation cost. Training with regular-scale data, experiments in face verification show D-Softmax is favorably comparable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Domain Adaptation and Few-Shot Learning · Face and Expression Recognition
MethodsAdditive Angular Margin Loss · Softmax
