LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization
Hailong Luo, Bin Wu, Hongyong Jia, Qingqing Zhu, Lianlei Shan

TL;DR
This paper introduces LGHRec, a novel recommendation framework that combines LLMs' Chain-of-Thought reasoning to generate semantic IDs with a reinforcement learning-based optimization to improve contrastive learning, leading to better recommendation performance.
Contribution
It proposes a new framework integrating LLM-CoT reasoning for semantic ID generation and a harmonized group policy optimization for contrastive learning, addressing key limitations in existing GNN-based recommenders.
Findings
LGHRec outperforms baseline models on three datasets.
Semantic IDs from LLM-CoT improve representation quality.
Harmonized Group Policy Optimization enhances long-tail recommendations.
Abstract
Graph neural networks (GNNs) have advanced recommender systems by modeling interaction relationships. However, existing graph-based recommenders rely on sparse ID features and do not fully exploit textual information, resulting in low information density within representations. Furthermore, graph contrastive learning faces challenges. Random negative sampling can introduce false negative samples, while fixed temperature coefficients cannot adapt to the heterogeneity of different nodes. In addition, current efforts to enhance recommendations with large language models (LLMs) have not fully utilized their Chain-of-Thought (CoT) reasoning capabilities to guide representation learning. To address these limitations, we introduces LGHRec (LLM-CoT Enhanced Graph Neural Recommendation with Harmonized Group Policy Optimization). This framework leverages the CoT reasoning ability of LLMs to…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper proposes a new RecSys training paradigm that aims to solve the disadvantages of previous efforts. 2. Paper proposes deep semantic embedding generator that generate much richer information through fine-tuned LLM. 3. The paper designs a new reinforcement training algorithm based on grouped user/items.
See Questions.
1. This work integrates LLMs and graph learning for recommendation, which is a novel topic that captures current trends and interests in the field. 2. The paper is well-written and easy to follow, with a clear articulation of the motivation behind the research. 3. The conducted experiments demonstrate the strengths of the proposed method, providing solid evidence of its effectiveness.
1. In terms of overall comparison, it appears that the authors only compare their proposed method against the base model. To further validate the effectiveness of their approach, they should consider including comparisons with other LLM-enhanced recommendation systems. 2. The authors should incorporate both significance analysis and case studies to provide a more comprehensive demonstration of the effectiveness of their work. 3. What is the efficiency of the proposed method? The authors should
- The paper effectively leverages LLM-based high-quality representations through the generation of semantic IDs, which contributes to the observed performance gains. - The experiment that groups users by interaction count and demonstrates robustness under long-tailed user distributions aligns well with the claimed contributions in the introduction, reinforcing the practical relevance of the proposed approach. - The paper presents an effective ablation study, which thoroughly evaluates the contri
- While the authors conduct a sensitivity analysis, the method still involves a large number of hyperparameters that require careful tuning, which may limit the practical applicability and ease of deployment of the approach. - Since Entropy Regularization loss is related to long-tail items, showing an ablation study on long-tailed items with this loss may enhance the novelty of the proposed method. - Although random negative sampling is acknowledged as a limitation, many graph-based contrastive
**Originality:** - Combining LLM CoT reasoning with GNN-based recommendations is a reasonable idea - The cross-group coordination mechanism in HGPO addresses a gap in GRPO - Offline preprocessing strategy avoids online LLM inference latency **Quality:** - Extensive experiments across 9 baseline models and 3 datasets - Detailed ablation studies examining different components - Analysis of performance across different user/item activity levels (Figure 5) - Convergence proof provided in appendix (
### 1. **Limited Technical Novelty ** The paper combines existing techniques without significant innovation: **DSEG Component:** - Using LLMs to generate text descriptions is standard practice - CoT prompting (Figure 4) is straightforward application of existing techniques (Wei et al., 2022) - Encoding with BERT is standard - **why not use the LLM's own embeddings?** This adds complexity - Mixed fine-tuning to prevent catastrophic forgetting is well-known - Simply concatenating semantic IDs wi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Recommender Systems and Techniques · Machine Learning in Healthcare
MethodsContrastive Learning
