Visual Zero-Shot E-Commerce Product Attribute Value Extraction

Jiaying Gong; Ming Cheng; Hongda Shen; Pierre-Yves Vandenbussche,; Janet Jenq; Hoda Eldardiry

arXiv:2502.15979·cs.IR·February 25, 2025

Visual Zero-Shot E-Commerce Product Attribute Value Extraction

Jiaying Gong, Ming Cheng, Hongda Shen, Pierre-Yves Vandenbussche,, Janet Jenq, Hoda Eldardiry

PDF

Open Access 1 Video

TL;DR

This paper introduces ViOC-AG, a cross-modal zero-shot framework that extracts product attribute values from images alone, reducing seller effort and outperforming existing models in e-Commerce applications.

Contribution

The paper presents a novel CLIP-based zero-shot attribute value extraction method that requires only images, with a task-specific text decoder and OCR/LLM corrections, avoiding manual descriptions.

Findings

01

ViOC-AG outperforms fine-tuned vision-language models in zero-shot extraction accuracy.

02

The framework effectively integrates OCR tokens and LLM outputs for improved attribute value correction.

03

It reduces the need for manual product descriptions, streamlining e-Commerce workflows.

Abstract

Existing zero-shot product attribute value (aspect) extraction approaches in e-Commerce industry rely on uni-modal or multi-modal models, where the sellers are asked to provide detailed textual inputs (product descriptions) for the products. However, manually providing (typing) the product descriptions is time-consuming and frustrating for the sellers. Thus, we propose a cross-modal zero-shot attribute value generation framework (ViOC-AG) based on CLIP, which only requires product images as the inputs. ViOC-AG follows a text-only training process, where a task-customized text decoder is trained with the frozen CLIP text encoder to alleviate the modality gap and task disconnection. During the zero-shot inference, product aspects are generated by the frozen CLIP image encoder connected with the trained task-customized text decoder. OCR tokens and outputs from a frozen prompt-based LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Visual Zero-Shot E-Commerce Product Attribute Value Extraction· underline

Taxonomy

TopicsWeb Data Mining and Analysis · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining

MethodsContrastive Language-Image Pre-training