Image-Seeking Intent Prediction for Cross-Device Product Search
Mariya Hendriksen, Svitlana Vakulenko, Jordan Massiah, Gabriella Kazai, Emine Yilmaz

TL;DR
This paper introduces a novel task and model for predicting when a user query in e-commerce requires visual augmentation and cross-device switching, enhancing personalized shopping experiences.
Contribution
It proposes Image-Seeking Intent Prediction, leveraging large-scale data and a new IRP model to improve cross-device product search accuracy.
Findings
Combining query semantics with product data improves prediction accuracy.
Lightweight summarization enhances model performance.
A differentiable loss reduces false positives.
Abstract
Large Language Models (LLMs) are transforming personalized search, recommendations, and customer interaction in e-commerce. Customers increasingly shop across multiple devices, from voice-only assistants to multimodal displays, each offering different input and output capabilities. A proactive suggestion to switch devices can greatly improve the user experience, but it must be offered with high precision to avoid unnecessary friction. We address the challenge of predicting when a query requires visual augmentation and a cross-device switch to improve product discovery. We introduce Image-Seeking Intent Prediction, a novel task for LLM-driven e-commerce assistants that anticipates when a spoken product query should proactively trigger a visual on a screen-enabled device. Using large-scale production data from a multi-device retail assistant, including 900K voice queries, associated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI in Service Interactions · Sentiment Analysis and Opinion Mining
