Investigation of Large-Margin Softmax in Neural Language Modeling
Jingjing Huo, Yingbo Gao, Weiyue Wang, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper explores applying large-margin softmax techniques, successful in face recognition, to neural language models to improve discriminative power and analyze effects on speech recognition performance.
Contribution
It introduces large-margin softmax into neural language modeling, compares margin types and norm-scaling strategies, and evaluates their impact on speech recognition metrics.
Findings
Perplexity slightly worsened with large-margin softmax
Word error rate remained comparable to standard softmax
Semantic and syntactic relationships are preserved in word vectors
Abstract
To encourage intra-class compactness and inter-class separability among trainable feature vectors, large-margin softmax methods are developed and widely applied in the face recognition community. The introduction of the large-margin concept into the softmax is reported to have good properties such as enhanced discriminative power, less overfitting and well-defined geometric intuitions. Nowadays, language modeling is commonly approached with neural networks using softmax and cross entropy. In this work, we are curious to see if introducing large-margins to neural language models would improve the perplexity and consequently word error rate in automatic speech recognition. Specifically, we first implement and test various types of conventional margins following the previous works in face recognition. To address the distribution of natural language data, we then compare different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Face recognition and analysis · Speech and Audio Processing
MethodsSoftmax
