LaTeX-Numeric: Language-agnostic Text attribute eXtraction for E-commerce Numeric Attributes
Kartik Mehta, Ioana Oprea, Nikhil Rasiwasia

TL;DR
LaTeX-Numeric is a scalable, fully-automated framework that extracts numeric attributes from e-commerce product texts using distant supervision and multi-task learning, significantly improving accuracy without manual labeling.
Contribution
It introduces a novel multi-task learning approach and automated alias creation techniques for high-precision numeric attribute extraction in e-commerce texts.
Findings
Achieves a 9.2% F1 improvement with multi-task learning.
Attains a 20.2% F1 boost through automated alias creation.
Demonstrates language-agnostic effectiveness with 13.9% F1 gain in Romance languages.
Abstract
In this paper, we present LaTeX-Numeric - a high-precision fully-automated scalable framework for extracting E-commerce numeric attributes from product text like product description. Most of the past work on attribute extraction is not scalable as they rely on manually curated training data, either with or without the use of active learning. We rely on distant supervision for training data generation, removing dependency on manual labels. One issue with distant supervision is that it leads to incomplete training annotation due to missing attribute values while matching. We propose a multi-task learning architecture to deal with missing labels in the training data, leading to F1 improvement of 9.2% for numeric attributes over single-task architecture. While multi-task architecture benefits both numeric and non-numeric attributes, we present automated techniques to further improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
