AdversaRiskQA: An Adversarial Factuality Benchmark for High-Risk Domains
Adam Szelestey, Sofie van Engelen, Tianhao Huang, Justin Snelders, Qintao Zeng, Songgaojun Deng

TL;DR
This paper introduces AdversaRiskQA, a benchmark for evaluating how well large language models resist adversarial misinformation in high-risk domains like health, finance, and law, highlighting current model vulnerabilities.
Contribution
It presents the first verified benchmark for adversarial factuality in high-risk domains and proposes automated evaluation methods for assessing model robustness against misinformation.
Findings
Qwen3 (80B) achieves highest accuracy among tested models.
Model performance improves with size but varies across domains.
No significant correlation between injected misinformation and factual output.
Abstract
Hallucination in large language models (LLMs) remains an acute concern, contributing to the spread of misinformation and diminished public trust, particularly in high-risk domains. Among hallucination types, factuality is crucial, as it concerns a model's alignment with established world knowledge. Adversarial factuality, defined as the deliberate insertion of misinformation into prompts with varying levels of expressed confidence, tests a model's ability to detect and resist confidently framed falsehoods. Existing work lacks high-quality, domain-specific resources for assessing model robustness under such adversarial conditions, and no prior research has examined the impact of injected misinformation on long-form text factuality. To address this gap, we introduce AdversaRiskQA, the first verified and reliable benchmark systematically evaluating adversarial factuality across Health,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
