PAM: Understanding Product Images in Cross Product Category Attribute   Extraction

Rongmei Lin; Xiang He; Jie Feng; Nasser Zalmout; Yan Liang; Li Xiong,; Xin Luna Dong

arXiv:2106.04630·cs.CV·June 10, 2021

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

Rongmei Lin, Xiang He, Jie Feng, Nasser Zalmout, Yan Liang, Li Xiong,, Xin Luna Dong

PDF

TL;DR

This paper introduces a transformer-based multimodal framework for extracting product attributes from images, text, and OCR tokens, improving accuracy across multiple categories in e-commerce.

Contribution

It presents a unified, multimodal attribute extraction model that leverages visual and textual cues, conditioned on product category, outperforming text-only methods.

Findings

01

15% gain in recall over text-only methods

02

10% improvement in F1 score

03

Effective across 14 product categories

Abstract

Understanding product attributes plays an important role in improving online shopping experience for customers and serves as an integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction from text description or utilize visual information from product images such as shape and color. Compared to the inputs considered in prior works, a product image in fact contains more information, represented by a rich mixture of words and visual clues with a layout carefully designed to impress customers. This work proposes a more inclusive framework that fully utilizes these different modalities for attribute extraction. Inspired by recent works in visual question answering, we use a transformer based sequence to sequence model to fuse representations of product text, Optical Character Recognition (OCR) tokens and visual objects detected in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.