Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Yan Wang; Yi Han; Lingfei Qian; Yueru He; Xueqing Peng; Dongji Feng; Zhuohan Xie; Vincent Jim Zhang; Rosie Guo; Fengran Mo; Jimin Huang; Yankai Chen; Xue Liu; Jian-Yun Nie

arXiv:2602.16990·cs.AI·May 19, 2026

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, Jimin Huang, Yankai Chen, Xue Liu, Jian-Yun Nie

PDF

2 Repos

TL;DR

Conv-FinRe introduces a novel benchmark for financial recommendation that evaluates large language models on decision quality and behavioral alignment using real market data and advisory dialogues.

Contribution

It presents a new conversational and longitudinal benchmark that distinguishes normative utility from descriptive behavior in financial recommendations.

Findings

01

Models with high utility ranking often diverge from user choices.

02

Behaviorally aligned models tend to overfit short-term noise.

03

The benchmark reveals a tension between rational decision-making and mimicking user behavior.

Abstract

Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy or short-sighted under market volatility and may conflict with a user's long-term goals. Treating what users chose as the sole ground truth, therefore, conflates behavioral imitation with decision quality. We introduce Conv-FinRe, a conversational and longitudinal benchmark for stock recommendation that evaluates LLMs beyond behavior matching. Given an onboarding interview, step-wise market context, and advisory dialogues, models must generate rankings over a fixed investment horizon. Crucially, Conv-FinRe provides multi-view references that distinguish descriptive behavior from normative utility grounded in investor-specific risk preferences, enabling diagnosis of whether an LLM follows rational analysis, mimics user noise, or is driven by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Expert finding and Q&A systems · Stock Market Forecasting Methods