Text Classification of the Precursory Accelerating Seismicity Corpus: Inference on some Theoretical Trends in Earthquake Predictability Research from 1988 to 2018
Arnaud Mignan

TL;DR
This study applies machine learning classifiers to seismology literature to analyze trends in earthquake predictability, finding Naive Bayes most effective for small datasets but with limited generalization to recent articles.
Contribution
First application of text classification to seismology articles, demonstrating potential and limitations of machine learning in analyzing earthquake predictability research trends.
Findings
Naive Bayes achieved 86% accuracy in binary classification.
Multiclass classification reached up to 78% accuracy.
Weak generalization to recent articles with 60% F1-score.
Abstract
Text analytics based on supervised machine learning classifiers has shown great promise in a multitude of domains, but has yet to be applied to Seismology. We test various standard models (Naive Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) on a seismological corpus of 100 articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled in Mignan (2011) with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate rather the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naive Bayes model performs best, in agreement with the machine learning literature for the case of small datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
