Analyzing Aviation Safety Narratives with LDA, NMF and PLSA: A Case Study Using Socrata Datasets
Aziida Nanyonga, Graham Wild

TL;DR
This paper applies LDA, NMF, and PLSA topic modeling techniques to analyze aviation safety narratives from Socrata datasets, revealing key themes and insights to improve safety protocols.
Contribution
It compares the effectiveness of three topic modeling methods on aviation safety data, highlighting their strengths and providing a foundation for advanced safety analysis.
Findings
LDA achieved highest coherence score of 0.36
NMF produced highly interpretable topics
PLSA offered nuanced probabilistic insights
Abstract
This study explores the application of topic modelling techniques Latent Dirichlet Allocation (LDA), Nonnegative Matrix Factorization (NMF), and Probabilistic Latent Semantic Analysis (PLSA) on the Socrata dataset spanning from 1908 to 2009. Categorized by operator type (military, commercial, and private), the analysis identified key themes such as pilot error, mechanical failure, weather conditions, and training deficiencies. The study highlights the unique strengths of each method: LDA ability to uncover overlapping themes, NMF production of distinct and interpretable topics, and PLSA nuanced probabilistic insights despite interpretative complexity. Statistical analysis revealed that PLSA achieved a coherence score of 0.32 and a perplexity value of -4.6, NMF scored 0.34 and 37.1, while LDA achieved the highest coherence of 0.36 but recorded the highest perplexity at 38.2. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research · Computational and Text Analysis Methods
MethodsLinear Discriminant Analysis
