PUMGPT: A Large Vision-Language Model for Product Understanding
Wei Xue, Zongyi Guo, Baoliang Cui, Zheng Xing, Xiaoyi Zeng, Xiufei, Wang, Shuhui Wu, Weiming Lu

TL;DR
PumGPT is a domain-specific large vision-language model tailored for e-commerce product understanding, outperforming general LVLMs and GPT-4V by leveraging a curated dataset and specialized tasks.
Contribution
This paper introduces PumGPT, the first e-commerce specialized LVLM with a new dataset and benchmark, enhancing product understanding accuracy and domain adaptation.
Findings
PumGPT outperforms five open-source LVLMs and GPT-4V in product tasks.
Curated dataset of 663k high-quality product samples.
Demonstrates the importance of domain-specific models for e-commerce.
Abstract
E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency. Traditional methods often focus on isolated tasks such as attribute extraction or categorization, posing adaptability issues to evolving tasks and leading to usability challenges with noisy data from the internet. Current Large Vision Language Models (LVLMs) lack domain-specific fine-tuning, thus falling short in precision and instruction following. To address these issues, we introduce PumGPT, the first e-commerce specialized LVLM designed for multi-modal product understanding tasks. We collected and curated a dataset of over one million products from AliExpress, filtering out non-inferable attributes using a universal hallucination detection framework, resulting in 663k high-quality data samples. PumGPT focuses on five essential tasks aimed at enhancing workflows for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsFocus
