Regular Expressions for Fast-response COVID-19 Text Classification
Igor L. Markov, Jacqueline Liu, Adam Vagner

TL;DR
This paper presents a regular expression-based approach for fast, multilingual COVID-19 text classification that achieves high precision and recall without labeled data, enabling rapid response and easy updates.
Contribution
It introduces a methodology for building high-precision, multilingual regular expression classifiers for COVID-19 without requiring labeled data, demonstrating advantages over neural network models.
Findings
High precision and recall for COVID-19 text classification across 66 languages.
Regular expressions enable low-latency, explainable classification.
Faster updates and revisions compared to neural network classifiers.
Abstract
Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Natural Language Processing Techniques
