Breaking the Softmax Bottleneck for Sequential Recommender Systems with Dropout and Decoupling
Ying-Chen Lin

TL;DR
This paper identifies additional causes of the Softmax bottleneck in session-based recommender systems and proposes Dropout and Decoupling (D&D) to improve model accuracy without increasing computational complexity.
Contribution
The paper reveals new aspects of the Softmax bottleneck in SBRSs and introduces D&D, a simple method that enhances Softmax-based models' expressivity and accuracy.
Findings
D&D significantly improves SBRS accuracy.
D&D matches or surpasses complex methods like MLP and MoS.
D&D maintains the same time complexity as standard Softmax models.
Abstract
The Softmax bottleneck was first identified in language modeling as a theoretical limit on the expressivity of Softmax-based models. Being one of the most widely-used methods to output probability, Softmax-based models have found a wide range of applications, including session-based recommender systems (SBRSs). Softmax-based models consist of a Softmax function on top of a final linear layer. The bottleneck has been shown to be caused by rank deficiency in the final linear layer due to its connection with matrix factorization. In this paper, we show that there are more aspects to the Softmax bottleneck in SBRSs. Contrary to common beliefs, overfitting does happen in the final linear layer, while it is often associated with complex networks. Furthermore, we identified that the common technique of sharing item embeddings among session sequences and the candidate pool creates a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Topic Modeling · Caching and Content Delivery
MethodsLinear Layer · Softmax · Dropout
