TL;DR
This paper presents a supervised machine learning approach to accurately identify correct location information in event texts, significantly improving geolocation accuracy over existing dictionary-based methods.
Contribution
Introduces a two-stage machine learning algorithm that classifies location words in news texts, enhancing geolocation accuracy for event data.
Findings
Improves geolocation accuracy by up to 25% over dictionary methods.
Uses contextual features like N-grams and mention frequency for classification.
Validated on ICEWS and OEDA datasets with positive results.
Abstract
Extracting the "correct" location information from text data, i.e., determining the place of event, has long been a goal for automated text processing. To approximate human-like coding schema, we introduce a supervised machine learning algorithm that classifies each location word to be either correct or incorrect. We use news articles collected from around the world (Integrated Crisis Early Warning System [ICEWS] data and Open Event Data Alliance [OEDA] data) to test our algorithm that consists of two stages. In the feature selection stage, we extract contextual information from texts, namely, the N-gram patterns for location words, the frequency of mention, and the context of the sentences containing location words. In the classification stage, we use three classifiers to estimate the model parameters in the training set and then to predict whether a location word in the test set news…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
