# Identifying and Analyzing Bot-Generated Responses in Online Health Care Surveys: Methodological Study

**Authors:** Emily Hamovitch, Kaileah McKellar, Walter P Wodchis

PMC · DOI: 10.2196/73622 · 2026-03-05

## TL;DR

This study develops methods to detect bot-generated responses in online health surveys and shows how bots can distort data and research conclusions.

## Contribution

The paper introduces a 3-tier classification system for bot detection in health surveys and demonstrates its impact on data validity.

## Key findings

- 58% of survey responses were classified as suspected bot-generated.
- Suspected bots showed response patterns centered on Likert scales, unlike probable humans.
- Correlations between health indicators were reversed in bot-generated data, indicating compromised validity.

## Abstract

The increasing reliance on online surveys for collecting patient-reported feedback for health care research has led to growing concerns over fraudulent responses generated by bots. These automated responses threaten data integrity by fabricating survey results, distorting statistical analyses, and potentially misguiding policy decisions. Addressing this issue is critical for maintaining the validity of research findings that inform health care practice and policy.

This study aimed to develop a robust set of criteria for identifying bot-generated responses in online health care surveys and to examine how these responses impact data quality. We then explored differences in survey results between probable human and suspected bot respondents in a survey assessing patient-reported outcome measures and patient-reported experience measures within a geographic region in Ontario, Canada.

A survey was conducted from July to October 2023 using Research Electronic Data Capture (REDCap; Vanderbilt University), distributed with a generic link via email, and later shared on social media. The survey collected data on health care use, patient experiences, health outcomes, digital health care engagement, and demographics. A 3-tier classification system was developed to detect bot responses based on predefined “red flags,” including duplicate open-ended responses, inconsistent demographic reporting, identical timestamps, and location discrepancies. Quantitative analysis included chi-square tests to assess differences between probable human and suspected bot responses and Spearman correlation tests to examine relationships among health care indicators.

Analysis included 1154 responses, with 58% (n=668) classified as suspected bot-generated. The most frequent suspected bot-identification criterion was duplicated open-ended responses (293/668, 44%). Chi-square tests revealed statistically significant differences (P<.05) between suspected bots and probable humans across most survey items. Suspected bots demonstrated response patterns concentrated in the middle of Likert scales, whereas probable humans were more likely to select extreme values. Correlation analyses showed that expected relationships between key health indicators (eg, depression symptoms) were present in probable human responses but reversed in suspected bot-generated data, highlighting the potential for compromised validity in unfiltered survey datasets.

The findings underscore the necessity of implementing bot prevention and detection methods in online health care surveys to preserve data integrity. Failure to do so risks distorting research conclusions, particularly in health equity studies where demographic misclassification may bias results. The study highlights effective bot detection strategies, including open-text analysis, timestamp evaluation, and geographic validation, and recommends integrating these techniques into survey design. As bots continue to evolve, ongoing advancements in bot prevention and detection will be crucial to maintaining the reliability of digital health research.

## Full-text entities

- **Diseases:** depression (MESH:D003866)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978908/full.md

---
Source: https://tomesphere.com/paper/PMC12978908