# Who wrote this book? A challenge for e-commerce

**Authors:** B\'eranger Dumont, Simona Maggio, Ghiles Sidi Said, Quoc-Tien Au

arXiv: 1905.01973 · 2019-05-07

## TL;DR

This paper presents a deep learning-based system to accurately identify and normalize author names in e-commerce book catalogs, addressing inconsistencies caused by abbreviations and misspellings.

## Contribution

It introduces a novel composite system combining open data and deep learning techniques, including Siamese networks and sequence-to-sequence models, for author name normalization at scale.

## Key findings

- 72% accuracy in normalized author name identification
- Effective handling of abbreviations and spelling variants
- Scalable approach for large e-commerce catalogs

## Abstract

Modern e-commerce catalogs contain millions of references, associated with textual and visual information that is of paramount importance for the products to be found via search or browsing. Of particular significance is the book category, where the author name(s) field poses a significant challenge. Indeed, books written by a given author (such as F. Scott Fitzgerald) might be listed with different authors' names in a catalog due to abbreviations and spelling variants and mistakes, among others. To solve this problem at scale, we design a composite system involving open data sources for books as well as machine learning components leveraging deep learning-based techniques for natural language processing. In particular, we use Siamese neural networks for an approximate match with known author names, and direct correction of the provided author's name using sequence-to-sequence learning with neural networks. We evaluate this approach on product data from the e-commerce website Rakuten France, and find that the top proposal of the system is the normalized author name with 72% accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01973/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1905.01973/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1905.01973/full.md

---
Source: https://tomesphere.com/paper/1905.01973