# A Clustering-Based Combinatorial Approach to Unsupervised Matching of   Product Titles

**Authors:** Leonidas Akritidis, Athanasios Fevgas, Panayiotis Bozanis, Christos, Makris

arXiv: 1903.04276 · 2019-03-12

## TL;DR

This paper presents UPM, an unsupervised, parameter-free clustering algorithm for matching product titles in e-commerce, which outperforms existing methods by analyzing word combinations without external data or pairwise comparisons.

## Contribution

The paper introduces UPM, a novel unsupervised and parameter-free clustering approach that effectively matches products based on titles without external data or pairwise comparisons.

## Key findings

- UPM outperforms state-of-the-art methods in efficiency.
- UPM achieves higher accuracy in product matching.
- The approach is independent of external data sources.

## Abstract

The constant growth of the e-commerce industry has rendered the problem of product retrieval particularly important. As more enterprises move their activities on the Web, the volume and the diversity of the product-related information increase quickly. These factors make it difficult for the users to identify and compare the features of their desired products. Recent studies proved that the standard similarity metrics cannot effectively identify identical products, since similar titles often refer to different products and vice-versa. Other studies employed external data sources (search engines) to enrich the titles; these solutions are rather impractical mainly because the external data fetching is slow. In this paper we introduce UPM, an unsupervised algorithm for matching products by their titles. UPM is independent of any external sources, since it analyzes the titles and extracts combinations of words out of them. These combinations are evaluated according to several criteria, and the most appropriate of them constitutes the cluster where a product is classified into. UPM is also parameter-free, it avoids product pairwise comparisons, and includes a post-processing verification stage which corrects the erroneous matches. The experimental evaluation of UPM demonstrated its superiority against the state-of-the-art approaches in terms of both efficiency and effectiveness.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.04276/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.04276/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/1903.04276/full.md

---
Source: https://tomesphere.com/paper/1903.04276