Categorizing Items with Short and Noisy Descriptions using Ensembled Transferred Embeddings
Yonatan Hadar, Erez Shmueli

TL;DR
This paper introduces Ensembled Transferred Embeddings (ETE), a novel framework for classifying e-commerce items with noisy, short textual descriptions, especially when labeled data is scarce, by leveraging transfer learning from related datasets.
Contribution
The paper proposes ETE, a new semi-supervised learning framework that combines small labeled samples with large-scale related datasets to improve item categorization accuracy.
Findings
ETE significantly outperforms traditional methods.
The approach is effective on large-scale real-world datasets.
Transferable embeddings enhance classification in noisy, short-text scenarios.
Abstract
Item categorization is a machine learning task which aims at classifying e-commerce items, typically represented by textual attributes, to their most suitable category from a predefined set of categories. An accurate item categorization system is essential for improving both the user experience and the operational processes of the company. In this work, we focus on item categorization settings in which the textual attributes representing items are noisy and short, and labels (i.e., accurate classification of items into categories) are not available. In order to cope with such settings, we propose a novel learning framework, Ensembled Transferred Embeddings (ETE), which relies on two key ideas: 1) labeling a relatively small sample of the target dataset, in a semi-automatic process, and 2) leveraging other datasets from related domains or related tasks that are large-scale and labeled,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
