FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning
Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

TL;DR
This paper explores the use of open-source large language models with data pruning techniques to improve check-worthy statement detection, achieving high performance with less training data in the context of social media misinformation filtering.
Contribution
It introduces a two-step data pruning method and evaluates eight open-source LLMs for check-worthiness detection, achieving state-of-the-art results with reduced training data.
Findings
Competitive performance with only 44% of training data
Ranked first in check-worthiness estimation at CheckThat! 2024
Effective use of open-source LLMs with data pruning
Abstract
The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically which claims are worth to be fact-checked. Despite advancements in this area in recent years, the application of large language models (LLMs), such as GPT, has only recently drawn attention in studies. However, many open-source LLMs remain underexplored. Therefore, this study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions. Further, we propose a two-step data pruning approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Data Quality and Management
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Attention Dropout · Dropout · Adam · Linear Layer · Dense Connections
