# Development and validation of a multi-agent AI pipeline for automated credibility assessment of tobacco misinformation: a proof-of-concept study

**Authors:** Sherif Elmitwalli, John Mehegan, Sophie Braznell, Allen Gallagher

PMC · DOI: 10.3389/frai.2025.1659861 · Frontiers in Artificial Intelligence · 2025-12-19

## TL;DR

A multi-agent AI system was developed to quickly and accurately assess the credibility of tobacco-related misinformation claims, showing strong agreement with human experts.

## Contribution

A novel multi-agent AI pipeline for automated credibility assessment of tobacco misinformation with real-time evidence retrieval and scoring.

## Key findings

- The system achieved substantial agreement with experts (κ = 0.68) and processed claims over 1,000 times faster than manual review.
- It exhibited a conservative bias and did not classify any claims as 'Highly Unlikely' despite expert assignments.
- The framework demonstrated technical feasibility and potential for real-time public health misinformation monitoring.

## Abstract

The proliferation of tobacco-related misinformation poses significant public health risks, requiring scalable solutions for credibility assessment. Traditional manual fact-checking approaches are resource-intensive and cannot match the pace of misinformation spread.

To develop and validate a proof-of-concept multi-agent AI pipeline for automated credibility assessment of tobacco misinformation claims, evaluating its performance against expert human reviewers.

We constructed a three-agent pipeline using OpenAI GPT-4.1 and the Crewai framework. The Serper API provided real-time evidence retrieval. The Content Analyzer classifies claims into four types: health impact, scientific assertion, policy, or statistical. The Scientific Fact Verifier queries authoritative sources (WHO, CDC, PubMed Central, Cochrane). The Health Evidence Assessor applies weighted scoring across five dimensions to assign 0–100 credibility scores on a five-level scale.

The framework achieved an MAE of 6.25 points against expert scores, a weighted Cohen’s κ of 0.68 (95% CI: 0.52–0.84) indicating substantial agreement, 70% exact category agreement, 95% adjacent-level agreement, and processed each claim in under 7 s—over 1,000 × faster than manual review.

We validated our approach using 20 diverse tobacco claims through intensive expert review (2–4 h per claim). The system exhibited a conservative bias (+3.25 points, p = 0.03) and did not classify any claims as “Highly Unlikely” despite expert assignment of two claims to this category. This proof-of-concept demonstrates technical feasibility and substantial inter-rater agreement while identifying areas for calibration in future large-scale implementations.

Our proof-of-concept agentic AI pipeline demonstrates substantial agreement with expert assessments of tobacco-related claims while providing dramatic speed improvements. By combining zero-shot LLM reasoning, retrieval-grounded evidence verification, and a transparent five-level scoring schema, the system offers a practical tool for real-time misinformation monitoring in public health. This proof-of-concept establishes technical feasibility for automated tobacco misinformation assessment, with validation results supporting further development and larger-scale testing before operational deployment.

## Full-text entities

- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12757325/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12757325/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/PMC12757325/full.md

---
Source: https://tomesphere.com/paper/PMC12757325