Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Hung-Chun Hsu; Yuan-Ching Kuo; Chao-Han Huck Yang; Szu-Wei Fu; Hanrong Ye; Hongxu Yin; Yu-Chiang Frank Wang; Ming-Feng Tsai; Chuan-Ju Wang

arXiv:2508.18132·cs.IR·August 26, 2025

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Hung-Chun Hsu, Yuan-Ching Kuo, Chao-Han Huck Yang, Szu-Wei Fu, Hanrong Ye, Hongxu Yin, Yu-Chiang Frank Wang, Ming-Feng Tsai, Chuan-Ju Wang

PDF

TL;DR

This paper introduces a test-time scaling framework with reranking for multimodal conversational product retrieval, significantly enhancing accuracy by refining results during inference to better capture evolving user intent.

Contribution

It presents a novel test-time reranking method that improves multimodal generative retrieval in multi-turn dialogues, addressing limitations of existing single-turn focused approaches.

Findings

01

Average 14.5 point gain in MRR

02

Average 10.6 point gain in nDCG@1

03

Consistent improvements across multiple benchmarks

Abstract

The rapid evolution of e-commerce has exposed the limitations of traditional product retrieval systems in managing complex, multi-turn user interactions. Recent advances in multimodal generative retrieval -- particularly those leveraging multimodal large language models (MLLMs) as retrievers -- have shown promise. However, most existing methods are tailored to single-turn scenarios and struggle to model the evolving intent and iterative nature of multi-turn dialogues when applied naively. Concurrently, test-time scaling has emerged as a powerful paradigm for improving large language model (LLM) performance through iterative inference-time refinement. Yet, its effectiveness typically relies on two conditions: (1) a well-defined problem space (e.g., mathematical reasoning), and (2) the model's ability to self-correct -- conditions that are rarely met in conversational product search. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.