A Framework for Leveraging Partially-Labeled Data for Product   Attribute-Value Identification

D. Subhalingam; Keshav Kolluru; Mausam; Saurabh Singal

arXiv:2405.10918·cs.CL·November 19, 2024

A Framework for Leveraging Partially-Labeled Data for Product Attribute-Value Identification

D. Subhalingam, Keshav Kolluru, Mausam, Saurabh Singal

PDF

Open Access

TL;DR

This paper introduces GenToC, a novel model that learns from partially-labeled data to improve attribute-value pair extraction in e-commerce, significantly enhancing accuracy and data quality for search and recommendation systems.

Contribution

We propose GenToC, a generative and token classification model that trains directly on partially-labeled data, outperforming existing models and enabling better data augmentation for attribute-value extraction.

Findings

01

GenToC achieves up to 56.3% more accurate extractions.

02

Bootstrapping with GenToC improves training data quality for NER models.

03

Successful deployment in IndiaMART increases correct attribute-value identification by 20.2%.

Abstract

In the e-commerce domain, the accurate extraction of attribute-value pairs (e.g., Brand: Apple) from product titles and user search queries is crucial for enhancing search and recommendation systems. A major challenge with neural models for this task is the lack of high-quality training data, as the annotations for attribute-value pairs in the available datasets are often incomplete. To address this, we introduce GenToC, a model designed for training directly with partially-labeled data, eliminating the necessity for a fully annotated dataset. GenToC employs a marker-augmented generative model to identify potential attributes, followed by a token classification model that determines the associated values for each attribute. GenToC outperforms existing state-of-the-art models, exhibiting upto 56.3% increase in the number of accurate extractions. Furthermore, we utilize GenToC to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic

MethodsSparse Evolutionary Training