Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case   Study Using Socrata Datasets

Aziida Nanyonga; Graham Wild

arXiv:2501.01690·cs.LG·January 6, 2025

Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case Study Using Socrata Datasets

Aziida Nanyonga, Graham Wild

PDF

Open Access

TL;DR

This paper applies LDA, NMF, and PLSA topic modeling techniques to analyze aviation safety narratives from Socrata datasets, revealing key themes and insights to improve safety protocols.

Contribution

It compares the effectiveness of three topic modeling methods on aviation safety data, highlighting their strengths and providing a foundation for advanced safety analysis.

Findings

01

LDA achieved highest coherence score of 0.36

02

NMF produced highly interpretable topics

03

PLSA offered nuanced probabilistic insights

Abstract

This study explores the application of topic modelling techniques Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), and Probabilistic Latent Semantic Analysis (PLSA) on the Socrata dataset spanning from 1908 to 2009. Categorized by operator type (military, commercial, and private), the analysis identified key themes such as pilot error, mechanical failure, weather conditions, and training deficiencies. The study highlights the unique strengths of each method: LDA ability to uncover overlapping themes, NMF production of distinct and interpretable topics, and PLSA nuanced probabilistic insights despite interpretative complexity. Statistical analysis revealed that PLSA achieved a coherence score of 0.32 and a perplexity value of -4.6, NMF scored 0.34 and 37.1, while LDA achieved the highest coherence of 0.36 but recorded the highest perplexity at 38.2. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research · Computational and Text Analysis Methods

MethodsLinear Discriminant Analysis