Multi-Modal Attribute Extraction for E-Commerce
Alo\"is De la Comble, Anuvabh Dutt, Pablo Montalvo, Aghiles Salah

TL;DR
This paper presents a novel multimodal deep learning approach combining text and images to predict product attributes in e-commerce, addressing incomplete attribute data and improving catalog organization.
Contribution
It introduces a new modality-merging technique with regularization to prevent modality collapse, enhancing attribute extraction from unstructured data.
Findings
Effective multimodal model outperforms unimodal baselines
Regularization mitigates modality collapse issues
Model successfully deployed on Rakuten-Ichiba data
Abstract
To improve users' experience as they navigate the myriad of options offered by online marketplaces, it is essential to have well-organized product catalogs. One key ingredient to that is the availability of product attributes such as color or material. However, on some marketplaces such as Rakuten-Ichiba, which we focus on, attribute information is often incomplete or even missing. One promising solution to this problem is to rely on deep models pre-trained on large corpora to predict attributes from unstructured data, such as product descriptive texts and images (referred to as modalities in this paper). However, we find that achieving satisfactory performance with this approach is not straightforward but rather the result of several refinements, which we discuss in this paper. We provide a detailed description of our approach to attribute extraction, from investigating strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Image Retrieval and Classification Techniques · Text and Document Classification Technologies
