CoNRec: Context-Discerning Negative Recommendation with LLMs
Xinda Chen, Jiawei Wu, Yishuang Liu, Jialin Zhu, Shuwen Xiao, Junjun Zheng, Xiangheng Kong, Yuning Jiang

TL;DR
This paper introduces CoNRec, a novel LLM-based framework that models user negative preferences in recommendation systems by leveraging semantic representations, context-aware training, and new evaluation metrics to improve understanding of negative feedback.
Contribution
It presents the first large language model framework for negative feedback modeling with context-discerning modules, semantic ID representations, and a progressive training paradigm to better capture negative user preferences.
Findings
Enhanced negative feedback understanding through semantic ID representations
Improved model performance with the Progressive GRPO training paradigm
New reward function and metrics based on multi-day future negative feedback
Abstract
Understanding what users like is relatively straightforward; understanding what users dislike, however, remains a challenging and underexplored problem. Research into users' negative preferences has gained increasing importance in modern recommendation systems. Numerous platforms have introduced explicit negative feedback mechanisms and leverage such signals to refine their recommendation models. Beyond traditional business metrics, user experience-driven metrics, such as negative feedback rates, have become critical indicators for evaluating system performance. However, most existing approaches primarily use negative feedback as an auxiliary signal to enhance positive recommendations, paying little attention to directly modeling negative interests, which can be highly valuable in offline applications. Moreover, due to the inherent sparsity of negative feedback data, models often suffer…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The motivation of CoNRec is interesting and the preliminary motivation study is clear in the misalignment between user negative interest and next feedback item, as well as the performance drop with extra context. 2. The proposed method is easy to follow and the corresponding figures are easy to understand. 3. This paper defines a new task scenario as Negative Recommendation and utilize several evaluation metrics tailored for this scenatio, such as FHR@20, LUF@20 and LIF@20. 4. The author cl
1. The technical contribution is limited and the proposed method is not very novel, the utilized RQ-VAE, LoRA and GRPO methods are widely utilized in many recommendation studies. 2. CoNRec leverages the explicit negative user feedback, whereas most of the baselines mainly rely on users’ historical interactions. This may lead to an unfair comparison. The authors should implement several baselines that also model negative feedback, or incorporate negative feedback into existing baselines to ensure
- **S1: Intuitive Method.** The designs of progressive GRPO, ground-truth extension, and reward shaping are intuitive and well-motivated. These components enhance the consistency between negative items and users’ negative interests, offering valuable insights for future research on negative preference modeling. - **S2: Comprehensive Analysis**. The ablation and analysis studies thoroughly validate the effectiveness of each component and investigate the impact of different reward formulations on
- **W1: Lack of Datasets.** The paper evaluates CoNRec on only one dataset, which is insufficient to convincingly demonstrate the model’s effectiveness. Incorporating additional datasets would strengthen the validity of the conclusions. - **W2: Reproducibility.** The paper does not provide source code, which affects the reproducibility and transparency of the results. It is recommended that the authors release the corresponding code to enhance credibility and facilitate future research.
1. The motivation is clearly grounded in real industrial needs, addressing limitations of rule-based filtering and single-instance negative feedback. 2. The method shows stronger performance for long-tail users and long-tail items, indicating enhanced robustness under sparse feedback conditions. 3. The use of a 7-day aggregated negative feedback signal is data-driven and empirically justified, improving the stability of negative preference supervision.
1. The core reliance on RQ-VAE semantic coding lacks validation regarding stability, interpretability, and semantic consistency. Critical configuration and robustness analyses are missing. 2. The contrastive alignment module may not ensure true separation of “liked vs. disliked” semantics and may instead capture co-purchase or exposure patterns. No embedding visualization or case analysis supports its claimed effect. 3. The progressive GRPO strategy is heuristic, lacking theoretical grounding an
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Advanced Bandit Algorithms Research
