BrainStorm @ iREL at #SMM4H 2024: Leveraging Translation and Topical Embeddings for Annotation Detection in Tweets
Manav Chaudhary, Harshit Gupta, Vasudeva Varma

TL;DR
This paper introduces BrainStorm, a novel method that leverages translation and topical embeddings to distinguish between human and LLM annotations in COVID-19 symptom detection tweets, improving annotation reliability.
Contribution
It presents a new approach combining translation and topical embeddings to identify annotation sources in tweets, addressing trust issues in LLM-generated data.
Findings
Effective in distinguishing human from LLM annotations
Improves trustworthiness of annotated COVID-19 tweet data
Leverages topical information for annotation classification
Abstract
The proliferation of LLMs in various NLP tasks has sparked debates regarding their reliability, particularly in annotation tasks where biases and hallucinations may arise. In this shared task, we address the challenge of distinguishing annotations made by LLMs from those made by human domain experts in the context of COVID-19 symptom detection from tweets in Latin American Spanish. This paper presents BrainStorm @ iRELs approach to the SMM4H 2024 Shared Task, leveraging the inherent topical information in tweets, we propose a novel approach to identify and classify annotations, aiming to enhance the trustworthiness of annotated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
