Automatic Query Optimization for Retrieving Traffic Tweets
Emory Hufbauer, Hana Khamfroush

TL;DR
This paper proposes new methods for automatically refining complex boolean search queries to improve the retrieval of relevant traffic-related tweets from Twitter's API, addressing challenges posed by tweet volume and brevity.
Contribution
It introduces updated automatic query optimization techniques specifically designed for Twitter's short, high-volume data environment, enhancing search precision and recall.
Findings
Preliminary results show improved retrieval of traffic incident tweets.
Manual classification confirms increased relevance of retrieved tweets.
Optimized queries demonstrate higher specificity in filtering traffic-related content.
Abstract
Twitter, like many social media and data brokering companies, makes their data available through a search API (application programming interface). In addition to filtering results by date and location, researchers can search for tweets with specific content with a boolean text query, using {\it AND}, {\it OR}, and {\it NOT} operators to select the combinations of phrases which must, or must not, appear in matching tweets. This boolean text search system is not at all unique to Twitter and is found in many different contexts, including academic, legal, and medical databases, however it is stretched to its limits in Twitter's use case because of the relative volume and brevity of tweets. In addition, the semi-automated use of such systems was well studied under the topic of Information Retrieval during the 1980s and 1990s, however the study of such systems has greatly declined since that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Natural Language Processing Techniques · Algorithms and Data Compression
