Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example   in Tracking Influenza-Like Illnesses

Son Doan; Lucila Ohno-Machado; Nigel Collier

arXiv:1210.0848·cs.SI·November 17, 2016

Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses

Son Doan, Lucila Ohno-Machado, Nigel Collier

PDF

TL;DR

This paper presents a simple semantic filtering method for Twitter data that significantly improves the accuracy of influenza-like illness tracking by leveraging NLP techniques on a large dataset.

Contribution

The authors introduce a novel semantic filtering approach using NLP features to enhance influenza surveillance from Twitter data, outperforming previous methods.

Findings

01

Achieved 98.46% Pearson correlation with actual influenza data

02

Improved correlation by 3.98% over previous state-of-the-art

03

Demonstrated effectiveness of simple NLP enhancements in social media analysis

Abstract

Systems that exploit publicly available user generated content such as Twitter messages have been successful in tracking seasonal influenza. We developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related messages using 587 million messages from Twitter micro-blogs. We first filtered messages based on syndrome keywords from the BioCaster Ontology, an extant knowledge model of laymen's terms. We then filtered the messages according to semantic features such as negation, hashtags, emoticons, humor and geography. The data covered 36 weeks for the US 2009 influenza season from 30th August 2009 to 8th May 2010. Results showed that our system achieved the highest Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of 3.98% over the previous state-of-the-art method. The results indicate that simple NLP-based enhancements to existing approaches to mine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.