Semi-Supervised Anomaly Detection for the Determination of Vehicle Hijacking Tweets
Taahir Aiyoob Patel, Clement N. Nyirenda

TL;DR
This paper introduces a semi-supervised approach using anomaly detection algorithms to identify vehicle hijacking incidents from tweets, achieving high accuracy and F1-scores, with CBLOF slightly outperforming KNN.
Contribution
The work presents a novel semi-supervised method combining TF-IDF with anomaly detection algorithms for hijacking tweet detection, demonstrating effectiveness over traditional approaches.
Findings
CBLOF achieved 90% accuracy and 0.8 F1-score
KNN achieved 89% accuracy and 0.78 F1-score
CBLOF was identified as the preferred method
Abstract
In South Africa, there is an ever-growing issue of vehicle hijackings. This leads to travellers constantly being in fear of becoming a victim to such an incident. This work presents a new semi-supervised approach to using tweets to identify hijacking incidents by using unsupervised anomaly detection algorithms. Tweets consisting of the keyword "hijacking" are obtained, stored, and processed using the term frequency-inverse document frequency (TF-IDF) and further analyzed by using two anomaly detection algorithms: 1) K-Nearest Neighbour (KNN); 2) Cluster Based Outlier Factor (CBLOF). The comparative evaluation showed that the KNN method produced an accuracy of 89%, whereas the CBLOF produced an accuracy of 90%. The CBLOF method was also able to obtain a F1-Score of 0.8, whereas the KNN produced a 0.78. Therefore, there is a slight difference between the two approaches, in favour of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Network Security and Intrusion Detection · Data-Driven Disease Surveillance
