Transformer Based Geocoding
Yuval Solaz, Vitaly Shalumov

TL;DR
This paper introduces a transformer-based sequence-to-sequence approach for geocoding, predicting geolocations from free text using a T5 model trained on geo-tagged data, with publicly available code and datasets.
Contribution
It formulates geocoding as a sequence-to-sequence task and trains a T5 transformer model specifically for this purpose, which is a novel application of transformers in geocoding.
Findings
Effective geolocation prediction from free text.
Open-source code and datasets available for reproducibility.
Adaptive cell partitioning improves geolocation accuracy.
Abstract
In this paper, we formulate the problem of predicting a geolocation from free text as a sequence-to-sequence problem. Using this formulation, we obtain a geocoding model by training a T5 encoder-decoder transformer model using free text as an input and geolocation as an output. The geocoding model was trained on geo-tagged wikidump data with adaptive cell partitioning for the geolocation representation. All of the code including Rest-based application, dataset and model checkpoints used in this work are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Wikis in Education and Collaboration · Algorithms and Data Compression
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Attention Dropout · Dropout · Dense Connections · Adafactor · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization
