Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques
Aziida Nanyonga, Hassan Wasswa, Ugur Turhan, Keith Joiner, and Graham, Wild

TL;DR
This study applies multiple NLP topic modeling and clustering techniques to analyze aviation incident narratives, uncovering latent themes and patterns to enhance safety investigations.
Contribution
It introduces a comparative analysis of various topic modeling methods on aviation incident data, demonstrating their effectiveness in extracting meaningful insights.
Findings
LDA achieved the highest coherence score of 0.597
Identified recurring themes in incident narratives
K-means clustering revealed incident groupings based on shared characteristics
Abstract
Aviation safety is a global concern, requiring detailed investigations into incidents to understand contributing factors comprehensively. This study uses the National Transportation Safety Board (NTSB) dataset. It applies advanced natural language processing (NLP) techniques, including Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (pLSA), and K-means clustering. The main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics. This research contributes to aviation safety by providing insights into incident narratives and demonstrating the versatility of NLP and topic modelling techniques in extracting valuable information from complex datasets. The results, including topics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Technology and Data Analysis · Big Data Technologies and Applications
Methodsk-Means Clustering · Linear Discriminant Analysis
