LT4SG@SMM4H24: Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models
Dasun Athukoralage, Thushari Atapattu, Menasha Thilakaratne and, Katrina Falkner

TL;DR
This paper explores the use of pre-trained language models, specifically RoBERTa-large and BERTweet-large ensembles, for classifying tweets reporting children's medical disorders to support digital epidemiology.
Contribution
It introduces an ensemble approach with BERTweet-large that outperforms a single RoBERTa-large model in classifying relevant tweets.
Findings
BERTweet-large ensemble achieves an F1-score of 0.938.
Ensembling improves test performance over individual models.
The approach outperforms the benchmark classifier by 1.18%.
Abstract
This paper presents our approaches for the SMM4H24 Shared Task 5 on the binary classification of English tweets reporting children's medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPediatric health and respiratory diseases · Respiratory viral infections research · Child and Adolescent Health
