Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified   Learning Scheme and Dynamic Range Minimization

Mengyin Liu; Chao Zhu; Hongyu Gao; Weibo Gu; Hongfa Wang; Wei Liu,; Xu-cheng Yin

arXiv:2207.07278·cs.CV·April 7, 2023

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

Mengyin Liu, Chao Zhu, Hongyu Gao, Weibo Gu, Hongfa Wang, Wei Liu,, Xu-cheng Yin

PDF

Open Access

TL;DR

This paper introduces a unified learning scheme and dynamic range minimization techniques to improve multi-modal e-commerce attribute extraction, effectively leveraging pretrained models and reducing false positives across diverse product data.

Contribution

The paper proposes a novel unified training framework and adaptive range minimization methods that enhance multi-modal attribute extraction by better utilizing pretrained models and prior knowledge.

Findings

01

Achieves superior performance on e-commerce benchmarks.

02

Effectively reduces false positives in attribute prediction.

03

Demonstrates the benefit of joint fine-tuning and range minimization techniques.

Abstract

With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Image Retrieval and Classification Techniques · Web Data Mining and Analysis