A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification
D. Subhalingam, Keshav Kolluru, Mausam, Saurabh Singal

TL;DR
This paper introduces GenToC, a novel model that learns from partially-labeled data to improve attribute-value pair extraction in e-commerce, significantly enhancing accuracy and data quality for search and recommendation systems.
Contribution
We propose GenToC, a generative and token classification model that trains directly on partially-labeled data, outperforming existing models and enabling better data augmentation for attribute-value extraction.
Findings
GenToC achieves up to 56.3% more accurate extractions.
Bootstrapping with GenToC improves training data quality for NER models.
Successful deployment in IndiaMART increases correct attribute-value identification by 20.2%.
Abstract
In the e-commerce domain, the accurate extraction of attribute-value pairs (e.g., Brand: Apple) from product titles and user search queries is crucial for enhancing search and recommendation systems. A major challenge with neural models for this task is the lack of high-quality training data, as the annotations for attribute-value pairs in the available datasets are often incomplete. To address this, we introduce GenToC, a model designed for training directly with partially-labeled data, eliminating the necessity for a fully annotated dataset. GenToC employs a marker-augmented generative model to identify potential attributes, followed by a token classification model that determines the associated values for each attribute. GenToC outperforms existing state-of-the-art models, exhibiting upto 56.3% increase in the number of accurate extractions. Furthermore, we utilize GenToC to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic
MethodsSparse Evolutionary Training
