FactFinders at CheckThat! 2024: Refining Check-worthy Statement   Detection with LLMs through Data Pruning

Yufeng Li; Rrubaa Panchendrarajan; Arkaitz Zubiaga

arXiv:2406.18297·cs.CL·June 27, 2024·1 cites

FactFinders at CheckThat! 2024: Refining Check-worthy Statement Detection with LLMs through Data Pruning

Yufeng Li, Rrubaa Panchendrarajan, Arkaitz Zubiaga

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of open-source large language models with data pruning techniques to improve check-worthy statement detection, achieving high performance with less training data in the context of social media misinformation filtering.

Contribution

It introduces a two-step data pruning method and evaluates eight open-source LLMs for check-worthiness detection, achieving state-of-the-art results with reduced training data.

Findings

01

Competitive performance with only 44% of training data

02

Ranked first in check-worthiness estimation at CheckThat! 2024

03

Effective use of open-source LLMs with data pruning

Abstract

The rapid dissemination of information through social media and the Internet has posed a significant challenge for fact-checking, among others in identifying check-worthy claims that fact-checkers should pay attention to, i.e. filtering claims needing fact-checking from a large pool of sentences. This challenge has stressed the need to focus on determining the priority of claims, specifically which claims are worth to be fact-checked. Despite advancements in this area in recent years, the application of large language models (LLMs), such as GPT, has only recently drawn attention in studies. However, many open-source LLMs remain underexplored. Therefore, this study investigates the application of eight prominent open-source LLMs with fine-tuning and prompt engineering to identify check-worthy statements from political transcriptions. Further, we propose a two-step data pruning approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isyufeng/FactFinders
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsResearch Data Management Practices · Data Quality and Management

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Attention Dropout · Dropout · Adam · Linear Layer · Dense Connections