MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data

Mykhailo Poliakov; Nadiya Shvai

arXiv:2510.26345·cs.CL·October 31, 2025

MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data

Mykhailo Poliakov, Nadiya Shvai

PDF

1 Datasets

TL;DR

MisSynth enhances the detection of scientific misinformation by generating synthetic fallacy data to fine-tune large language models, significantly improving classification accuracy with limited resources.

Contribution

The paper introduces MisSynth, a novel pipeline using retrieval-augmented generation to create synthetic data for fine-tuning LLMs in misinformation detection.

Findings

01

Over 35% F1-score improvement with fine-tuned LLaMA 3.1 8B model.

02

Synthetic data augmentation boosts zero-shot classification performance.

03

Significant accuracy gains with limited computational resources.

Abstract

Health-related misinformation is very prevalent and potentially harmful. It is difficult to identify, especially when claims distort or misinterpret scientific findings. We investigate the impact of synthetic data generation and lightweight fine-tuning techniques on the ability of large language models (LLMs) to recognize fallacious arguments using the MISSCI dataset and framework. In this work, we propose MisSynth, a pipeline that applies retrieval-augmented generation (RAG) to produce synthetic fallacy samples, which are then used to fine-tune an LLM model. Our results show substantial accuracy gains with fine-tuned models compared to vanilla baselines. For instance, the LLaMA 3.1 8B fine-tuned model achieved an over 35% F1-score absolute improvement on the MISSCI test split over its vanilla baseline. We demonstrate that introducing synthetic fallacy data to augment limited annotated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

mxpoliakov/MisSynth
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.