Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation

Hongyang Su; Beibei Kong; Lei Cheng; Chengxiang Zhuo; Zang Li; Chenyun Yu

arXiv:2605.07613·cs.CL·May 11, 2026

Intent-Driven Semantic ID Generation for Grounded Conversational News Recommendation

Hongyang Su, Beibei Kong, Lei Cheng, Chengxiang Zhuo, Zang Li, Chenyun Yu

PDF

TL;DR

This paper introduces a novel intent-driven Semantic ID generation method for grounded conversational news recommendation, improving relevance and reducing hallucinations with a specialized training paradigm and dual-signal reasoning.

Contribution

It proposes a Generate-then-Match paradigm with multi-task training and Chain-of-Thought distillation, enabling hierarchical SID mapping and effective cold-start recommendations.

Findings

01

Achieves 0% hallucination and 12.4% L1 match in a large Chinese news dataset.

02

Surpasses GPT-4+Hybrid RAG on finer-grained metrics at 100x lower cost.

03

Provides effective recommendations for cold-start users with 18.0% L1 match.

Abstract

Conversational news recommendation requires grounding each suggestion in a rapidly evolving article corpus while addressing implicit user intents that lack explicit retrievable keywords. To characterize this scenario, we identify 6 intent types from production dialogues: five are implicit and pose fundamental challenges to standard RAG pipelines, forming a critical retrieve-first bottleneck. To address these issues, we introduce intent-driven Semantic ID (SID) generation under a Generate-then-Match paradigm. With two-stage training that consists of multi-task SID alignment and GPT-4 Chain-of-Thought distillation, an LLM maps diverse intents to hierarchical SID prefixes, which are then fuzzy-matched to the current news pool to guarantee fully grounded recommendations. Profile-Aware Dual-Signal Reasoning (PADR) further enables cold-start users to obtain valid recommendations using only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.