Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation
Hongyang Su, Beibei Kong, Lei Cheng, Chengxiang Zhuo, Zang Li, Chenyun Yu

TL;DR
This paper introduces a novel intent-driven Semantic ID generation method for grounded conversational news recommendation, improving relevance and reducing hallucinations with a specialized training paradigm and dual-signal reasoning.
Contribution
It proposes a Generate-then-Match paradigm with multi-task training and Chain-of-Thought distillation, enabling hierarchical SID mapping and effective cold-start recommendations.
Findings
Achieves 0% hallucination and 12.4% L1 match in a large Chinese news dataset.
Surpasses GPT-4+Hybrid RAG on finer-grained metrics at 100x lower cost.
Provides effective recommendations for cold-start users with 18.0% L1 match.
Abstract
Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Match paradigm. With two-stage training that consists of multi-task SID alignment and GPT-4 Chain-of-Thought distillation, an LLM maps diverse intents to hierarchical SID prefixes, which are then fuzzy-matched to the current news pool to guarantee fully grounded recommendations. Profile-Aware Dual-Signal Reasoning (PADR) further enables cold-start users to obtain valid recommendations using only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
