MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding
Baixuan Xu, Weiqi Wang, Haochen Shi, Wenxuan Ding, Huihao Jing,, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu, Yin, Bing Yin, Long Chen, Yangqiu Song

TL;DR
This paper introduces MIND, a multimodal framework leveraging vision-language models to infer and distill human-centric purchase intentions from e-commerce data, improving understanding and personalization in online shopping.
Contribution
MIND is the first multimodal approach that combines product images and metadata to generate a large-scale, human-centric intention knowledge base for e-commerce.
Findings
Created a knowledge base with over 1.2 million intentions.
Human evaluations confirm high plausibility and typicality of intentions.
Enhanced large language models' performance on intention comprehension tasks.
Abstract
Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. Using Amazon Review data, we apply MIND and create a multimodal intention knowledge base, which contains 1,264,441 million intentions derived from 126,142 co-buy shopping records across 107,215 products. Extensive human evaluations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Digital Marketing and Social Media · Advanced Text Analysis Techniques
