Where on Earth Do Users Say They Are?: Geo-Entity Linking for Noisy Multilingual User Input
Tessa Masis, Brendan O'Connor

TL;DR
This paper introduces a scalable, multilingual geo-entity linking method for noisy social media data, utilizing averaged location embeddings and confidence scores to improve accuracy across diverse geographic regions.
Contribution
It presents a novel approach that leverages averaged embeddings and interpretable confidence scores, addressing limitations of rule-based and LLM-based tools in social media contexts.
Findings
Improved geo-entity linking accuracy on global multilingual social media data.
Effective handling of noisy, multilingual location mentions.
Discussion of evaluation challenges at different geographic granularities.
Abstract
Geo-entity linking is the task of linking a location mention to the real-world geographic location. In this paper we explore the challenging task of geo-entity linking for noisy, multilingual social media data. There are few open-source multilingual geo-entity linking tools available and existing ones are often rule-based, which break easily in social media settings, or LLM-based, which are too expensive for large-scale datasets. We present a method which represents real-world locations as averaged embeddings from labeled user-input location names and allows for selective prediction via an interpretable confidence score. We show that our approach improves geo-entity linking on a global and multilingual social media dataset, and discuss progress and problems with evaluating at different geographic granularities.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Data Quality and Management · Data Management and Algorithms
