Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks
Chunjie Luo, Jianfeng Zhan, Lei Wang, Qiang Yang

TL;DR
This paper introduces cosine normalization, replacing dot product with cosine similarity in neural networks to reduce variance and improve training stability, demonstrating superior performance over existing normalization methods across various datasets.
Contribution
It proposes a novel normalization technique using cosine similarity, addressing variance issues in neural networks and outperforming traditional normalization methods.
Findings
Cosine normalization reduces variance compared to dot product.
It outperforms batch, weight, and layer normalization in experiments.
Improves generalization and training stability across multiple datasets.
Abstract
Traditionally, multi-layer neural networks use dot product between the output vector of previous layer and the incoming weight vector as the input to activation function. The result of dot product is unbounded, thus increases the risk of large variance. Large variance of neuron makes the model sensitive to the change of input distribution, thus results in poor generalization, and aggravates the internal covariate shift which slows down the training. To bound dot product and decrease the variance, we propose to use cosine similarity or centered cosine similarity (Pearson Correlation Coefficient) instead of dot product in neural networks, which we call cosine normalization. We compare cosine normalization with batch, weight and layer normalization in fully-connected neural networks as well as convolutional networks on the data sets of MNIST, 20NEWS GROUP, CIFAR-10/100 and SVHN.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Neural Network Applications · Machine Learning and ELM
MethodsCosine Normalization
