Leveraging Large Language Models to Geolocate Linguistic Variations in   Social Media Posts

Davide Savarro; Davide Zago; Stefano Zoia

arXiv:2407.16047·cs.CL·July 24, 2024

Leveraging Large Language Models to Geolocate Linguistic Variations in Social Media Posts

Davide Savarro, Davide Zago, Stefano Zoia

PDF

Open Access 1 Repo

TL;DR

This paper presents a method using fine-tuned large language models to accurately geolocate Italian social media posts by predicting both regional and coordinate information, advancing the state-of-the-art in linguistic geolocation.

Contribution

The study introduces a novel approach of fine-tuning LLMs specifically for Italian social media geolocalization, addressing both regional and precise coordinate prediction tasks.

Findings

01

Enhanced geolocalization accuracy over previous methods

02

Effective understanding of Italian social media linguistic nuances

03

Open-source code for reproducibility and further research

Abstract

Geolocalization of social media content is the task of determining the geographical location of a user based on textual data, that may show linguistic variations and informal language. In this project, we address the GeoLingIt challenge of geolocalizing tweets written in Italian by leveraging large language models (LLMs). GeoLingIt requires the prediction of both the region and the precise coordinates of the tweet. Our approach involves fine-tuning pre-trained LLMs to simultaneously predict these geolocalization aspects. By integrating innovative methodologies, we enhance the models' ability to understand the nuances of Italian social media text to improve the state-of-the-art in this domain. This work is conducted as part of the Large Language Models course at the Bertinoro International Spring School 2024. We make our code publicly available on GitHub…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dawoz/geolingit-biss2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems