HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation
Subham Raj, Aman Vaibhav Jha, Mayank Anand, Sriparna Saha

TL;DR
HARPO introduces a hierarchical, decision-oriented framework for conversational recommendation that explicitly optimizes multi-dimensional recommendation quality, leading to improved user-aligned suggestions across multiple datasets.
Contribution
It presents a novel agentic framework combining hierarchical preference learning and tree-search reasoning for better recommendation quality in CRSs.
Findings
HARPO outperforms baselines on recommendation metrics across three datasets.
Explicit multi-dimensional preference modeling improves recommendation relevance and diversity.
Tree-search reasoning guided by a learned value network enhances decision quality.
Abstract
Conversational recommender systems (CRSs) operate under incremental preference revelation, requiring systems to make recommendation decisions under uncertainty. While recent approaches particularly those built on large language models achieve strong performance on standard proxy metrics such as Recall@K and BLEU, they often fail to deliver high-quality, user-aligned recommendations in practice. This gap arises because existing methods primarily optimize for intermediate objectives like retrieval accuracy, fluent generation, or tool invocation, rather than recommendation quality itself. We propose HARPO (Hierarchical Agentic Reasoning with Preference Optimization), an agentic framework that reframes conversational recommendation as a structured decision-making process explicitly optimized for multi-dimensional recommendation quality. HARPO integrates hierarchical preference learning that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
