MisSynth: Improving MISSCI Logical Fallacies Classification with Synthetic Data
Mykhailo Poliakov, Nadiya Shvai

TL;DR
MisSynth enhances the detection of scientific misinformation by generating synthetic fallacy data to fine-tune large language models, significantly improving classification accuracy with limited resources.
Contribution
The paper introduces MisSynth, a novel pipeline using retrieval-augmented generation to create synthetic data for fine-tuning LLMs in misinformation detection.
Findings
Over 35% F1-score improvement with fine-tuned LLaMA 3.1 8B model.
Synthetic data augmentation boosts zero-shot classification performance.
Significant accuracy gains with limited computational resources.
Abstract
Health-related misinformation is very prevalent and potentially harmful. It is difficult to identify, especially when claims distort or misinterpret scientific findings. We investigate the impact of synthetic data generation and lightweight fine-tuning techniques on the ability of large language models (LLMs) to recognize fallacious arguments using the MISSCI dataset and framework. In this work, we propose MisSynth, a pipeline that applies retrieval-augmented generation (RAG) to produce synthetic fallacy samples, which are then used to fine-tune an LLM model. Our results show substantial accuracy gains with fine-tuned models compared to vanilla baselines. For instance, the LLaMA 3.1 8B fine-tuned model achieved an over 35% F1-score absolute improvement on the MISSCI test split over its vanilla baseline. We demonstrate that introducing synthetic fallacy data to augment limited annotated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
