# MAGNET: Counterfactual samples synthesizing for mitigating hallucination in large language models

**Authors:** Byeong Su Kim, Beomsoo Kim, Beakcheol Jang, Sonia Vasconcelos, Sonia Vasconcelos, Sonia Vasconcelos, Sonia Vasconcelos

PMC · DOI: 10.1371/journal.pone.0340812 · PLOS One · 2026-02-23

## TL;DR

This paper introduces MAGNET, a method to reduce hallucinations in large language models by using counterfactual samples during fine-tuning.

## Contribution

The novelty lies in using counterfactual samples to mitigate biases from pre-training data co-occurrence statistics.

## Key findings

- MAGNET improved factual knowledge probing by 12% on the GPT-Neo 2.7B model.
- It showed a 2.27% performance improvement in the TruthfulQA experiment on the GPT-Neo 125M model.

## Abstract

Hallucinations are widely recognized as a significant drawback of large language models. Several attempts have been made to reduce the intensity of hallucinations. Among the various attempts, our research has been directed towards mitigating hallucinations caused by the co-occurrence statistics of pre-training corpora. We introduce Model-AGNostic countErfacTual synthesis and adaptive fine-tuning framework (MAGNET), a fine-tuning method that can mitigate the bias of co-occurrence statistics on large language models pre-training data when generating sentences. Our pipeline generates the counterfactual sample sentences and subject and object information for the counterfactual sample from the language model, and filters them to make sure they contain these three pieces of information before using them as fine-tuning data. Next, it utilizes both the generated counterfactual sample and the original sentence used to generate it as a training dataset. When our method is applied to GPT-Neo 2.7B model, it shows a 12% improvement in the Factual Knowledge Probing experiment, and there is a correlation analysis that can mitigate the bias on the pre-training data. In the TruthfulQA experiment, when fine-tuning the GPT-Neo 125M model on the LAMA-TREx dataset, applying our method showed 2.27% better performance than not applying it.

## Full-text entities

- **Diseases:** Diabetic (MESH:D003920), LLMs (MESH:D007806), Hallucinations (MESH:D006212), MAGNET (MESH:C566019), War (MESH:D000067398)
- **Chemicals:** sugar (MESH:D000073893), LAMA (-)
- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615], Homo sapiens (human, species) [taxon 9606], Gallus gallus (bantam, species) [taxon 9031]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12928391/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12928391/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12928391/full.md

---
Source: https://tomesphere.com/paper/PMC12928391