Correcting the LogQ Correction: Revisiting Sampled Softmax for Large-Scale Retrieval
Kirill Khrylchenko, Vladimir Baikalov, Sergei Makeev, Artem Matveev, Sergei Liamaev

TL;DR
This paper refines the logQ correction technique for sampled softmax in large-scale retrieval models, addressing bias issues and introducing a new correction formula that improves model performance and interpretability.
Contribution
We identify a subtle flaw in the existing logQ correction and propose a refined formula that accounts for the always-present positive item, enhancing bias correction.
Findings
Our method outperforms standard logQ correction on multiple datasets.
The new correction introduces an interpretable sample weight based on model uncertainty.
Empirical results show consistent improvements in retrieval accuracy.
Abstract
Two-tower neural networks are a popular architecture for the retrieval stage in recommender systems. These models are typically trained with a softmax loss over the item catalog. However, in web-scale settings, the item catalog is often prohibitively large, making full softmax infeasible. A common solution is sampled softmax, which approximates the full softmax using a small number of sampled negatives. One practical and widely adopted approach is to use in-batch negatives, where negatives are drawn from items in the current mini-batch. However, this introduces a bias: items that appear more frequently in the batch (i.e., popular items) are penalized more heavily. To mitigate this issue, a popular industry technique known as logQ correction adjusts the logits during training by subtracting the log-probability of an item appearing in the batch. This correction is derived by analyzing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
