Robust Product Classification with Instance-Dependent Noise
Huy Nguyen, Devashish Khatwani

TL;DR
This paper investigates the impact of instance-dependent label noise on product title classification and proposes a novel noise stimulation algorithm to improve model robustness against noisy labels in e-commerce data.
Contribution
It introduces a new noise stimulation algorithm based on product title similarity and compares it with existing noise-resistance training methods for robust classification.
Findings
Noise significantly degrades classification performance at high noise rates.
The proposed similarity-based noise stimulation improves robustness in noisy environments.
Performance limits are identified when noise rate is high and data distribution is skewed.
Abstract
Noisy labels in large E-commerce product data (i.e., product items are placed into incorrect categories) are a critical issue for product categorization task because they are unavoidable, non-trivial to remove and degrade prediction performance significantly. Training a product title classification model which is robust to noisy labels in the data is very important to make product classification applications more practical. In this paper, we study the impact of instance-dependent noise to performance of product title classification by comparing our data denoising algorithm and different noise-resistance training algorithms which were designed to prevent a classifier model from over-fitting to noise. We develop a simple yet effective Deep Neural Network for product title classification to use as a base classifier. Along with recent methods of stimulating instance-dependent noise, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsBalanced Selection
