Multi-class Twitter Data Categorization and Geocoding with a Novel Computing Framework
Sakib Mahmud Khan, Mashrur Chowdhury, Linh B. Ngo, Amy Apon

TL;DR
This paper presents a novel computing framework combining L-LDA and SVM to classify and geocode transportation-related Twitter data with high accuracy, demonstrated through a case study in New York City.
Contribution
The study introduces a new analytical framework that integrates L-LDA with SVM for improved transportation data classification from Twitter.
Findings
SVM classifier achieves over 85% accuracy in identifying transportation tweets.
The combined L-LDA incorporated SVM achieves over 98.3% classification accuracy.
The framework effectively geocodes and categorizes transportation events from Twitter data.
Abstract
This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labelled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves more than 85% accuracy in identifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
