Leveraging Subword Embeddings for Multinational Address Parsing

Marouane Yassine; David Beauchemin; Fran\c{c}ois Laviolette; Luc; Lamontagne

arXiv:2006.16152·cs.CL·April 12, 2022

Leveraging Subword Embeddings for Multinational Address Parsing

Marouane Yassine, David Beauchemin, Fran\c{c}ois Laviolette, Luc, Lamontagne

PDF

3 Repos

TL;DR

This paper introduces a multilingual address parsing model using subword embeddings and RNNs, achieving high accuracy across multiple countries and enabling zero-shot transfer learning, with an open-source implementation.

Contribution

The paper presents a novel multilingual address parsing approach with a single model capable of handling multiple countries and languages, including zero-shot transfer learning capabilities.

Findings

01

Achieved ~99% accuracy on training countries without pre/post-processing.

02

Successfully transferred address parsing knowledge to 80% of new countries.

03

Nearly 50% of the countries reached near state-of-the-art performance.

Abstract

Address parsing consists of identifying the segments that make up an address such as a street name or a postal code. Because of its importance for tasks like record linkage, address parsing has been approached with many techniques. Neural network methods defined a new state-of-the-art for address parsing. While this approach yielded notable results, previous work has only focused on applying neural networks to achieve address parsing of addresses from one source country. We propose an approach in which we employ subword embeddings and a Recurrent Neural Network architecture to build a single model capable of learning to parse addresses from multiple countries at the same time while taking into account the difference in languages and address formatting systems. We achieved accuracies around 99 % on the countries used for training with no pre-processing nor post-processing needed. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.