Understanding the Role of Cross-Entropy Loss in Fairly Evaluating Large Language Model-based Recommendation
Cong Xu, Zhangchi Zhu, Jun Wang, Jianyong Wang, Wei Zhang

TL;DR
This paper critically evaluates the effectiveness of large language models in recommendation tasks, revealing that their perceived superiority is often overstated due to unfair comparison methods and highlighting the importance of proper evaluation standards.
Contribution
It provides a theoretical justification for using cross-entropy loss and demonstrates that alternative approximations can be effective, challenging previous claims of LLMs' dominance in recommendation.
Findings
Cross-entropy loss is theoretically superior for recommendation.
Existing LLM-based methods are less effective than previously claimed.
Proper evaluation reveals traditional methods can perform competitively.
Abstract
Large language models (LLMs) have gained much attention in the recommendation community; some studies have observed that LLMs, fine-tuned by the cross-entropy loss with a full softmax, could achieve state-of-the-art performance already. However, these claims are drawn from unobjective and unfair comparisons. In view of the substantial quantity of items in reality, conventional recommenders typically adopt a pointwise/pairwise loss function instead for training. This substitute however causes severe performance degradation, leading to under-estimation of conventional methods and over-confidence in the ranking capability of LLMs. In this work, we theoretically justify the superiority of cross-entropy, and showcase that it can be adequately replaced by some elementary approximations with certain necessary modifications. The remarkable results across three public datasets corroborate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
