Transformer Based Geocoding

Yuval Solaz; Vitaly Shalumov

arXiv:2301.01170·cs.CL·January 4, 2023

Transformer Based Geocoding

Yuval Solaz, Vitaly Shalumov

PDF

Open Access

TL;DR

This paper introduces a transformer-based sequence-to-sequence approach for geocoding, predicting geolocations from free text using a T5 model trained on geo-tagged data, with publicly available code and datasets.

Contribution

It formulates geocoding as a sequence-to-sequence task and trains a T5 transformer model specifically for this purpose, which is a novel application of transformers in geocoding.

Findings

01

Effective geolocation prediction from free text.

02

Open-source code and datasets available for reproducibility.

03

Adaptive cell partitioning improves geolocation accuracy.

Abstract

In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Wikis in Education and Collaboration · Algorithms and Data Compression

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Attention Dropout · Dropout · Dense Connections · Adafactor · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization