Is BERTopic Better than PLSA for Extracting Key Topics in Aviation Safety Reports?
Aziida Nanyonga, Joiner Keith, Turhan Ugur, and Wild Graham

TL;DR
This study compares BERTopic and PLSA for extracting key topics from aviation safety reports, finding BERTopic more effective due to transformer-based embeddings and hierarchical clustering, leading to better coherence and interpretability.
Contribution
The paper demonstrates that BERTopic outperforms PLSA in aviation safety report analysis using a large dataset, highlighting the benefits of transformer-based topic modeling.
Findings
BERTopic achieved higher topic coherence (Cv score 0.41 vs 0.37)
BERTopic was rated more interpretable by safety experts
Transformer-based embeddings improve aviation incident data analysis
Abstract
This study compares the effectiveness of BERTopic and Probabilistic Latent Semantic Analysis (PLSA) in extracting meaningful topics from aviation safety reports aiming to enhance the understanding of patterns in aviation incident data. Using a dataset of over 36,000 National Transportation Safety Board (NTSB) reports from 2000 to 2020, BERTopic employed transformer based embeddings and hierarchical clustering, while PLSA utilized probabilistic modelling through the Expectation-Maximization (EM) algorithm. Results showed that BERTopic outperformed PLSA in topic coherence, achieving a Cv score of 0.41 compared to PLSA 0.37, while also demonstrating superior interpretability as validated by aviation safety experts. These findings underscore the advantages of modern transformer based approaches in analyzing complex aviation datasets, paving the way for enhanced insights and informed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Traffic Management and Optimization · Occupational Health and Safety Research · Topic Modeling
