Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only
Ziyi Liu, Giannis Karamanolakis, Daniel Hsu, Luis Gravano

TL;DR
This paper proposes a multilingual foodborne illness detection method using English annotations only, leveraging machine translation and joint training with mBERT to improve cross-lingual performance in social media reviews.
Contribution
It introduces a novel approach combining machine translation and multilingual BERT training to detect foodborne illness complaints across multiple languages without additional annotations.
Findings
Machine translation improves cross-lingual detection accuracy.
Joint training with translated data enhances performance in target languages.
Effective detection demonstrated in seven languages on Yelp reviews.
Abstract
Health departments have been deploying text classification systems for the early detection of foodborne illness complaints in social media documents such as Yelp restaurant reviews. Current systems have been successfully applied for documents in English and, as a result, a promising direction is to increase coverage and recall by considering documents in additional languages, such as Spanish or Chinese. Training previous systems for more languages, however, would be expensive, as it would require the manual annotation of many documents for each new target language. To address this challenge, we consider cross-lingual learning and train multilingual classifiers using only the annotations for English-language reviews. Recent zero-shot approaches based on pre-trained multi-lingual BERT (mBERT) have been shown to effectively align languages for aspects such as sentiment. Interestingly, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
MethodsLinear Layer · mBERT · WordPiece · Adam · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Dropout · Linear Warmup With Linear Decay
