Synthesizing Products for Online Catalogs
Hoa Nguyen (University of Utah), Ariel Fuxman (Microsoft Research),, Stelios Paparizos (Microsoft Research), Juliana Freire (University of Utah),, Rakesh Agrawal (Microsoft Research)

TL;DR
This paper presents an end-to-end system for automated product synthesis in online catalogs, addressing challenges in data extraction, schema reconciliation, and data fusion, with a novel scalable schema matching technique trained without manual labels.
Contribution
The paper introduces a scalable, knowledge-based schema matching method for product synthesis that outperforms existing techniques and requires no manual labeling.
Findings
Successfully processed over 800K offers from 1000 merchants
Achieved higher precision and recall than state-of-the-art schema matching methods
Generated accurate product specifications for 400 categories
Abstract
A high-quality, comprehensive product catalog is essential to the success of Product Search engines and shopping sites such as Yahoo! Shopping, Google Product Search or Bing Shopping. But keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques. In this paper, we introduce the problem of product synthesis, a key component of catalog creation and maintenance. Given a set of offers advertised by merchants, the goal is to identify new products and add them to the catalog together with their (structured) attributes. A fundamental challenge is the scale of the problem: a Product Search engine receives data from thousands of merchants and millions of products; the product taxonomy contains thousands of categories, where each category comes in a different schema; and merchants use representations for products that are different from the ones used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Data Quality and Management · Semantic Web and Ontologies
