Using Epidemiological Test Diagnostics to Select Fraud Detection Methods: Secondary Analysis of Quantitative Cross-Sectional Survey Data
Rachel Willard-Grace, Tali Klima, Mansi Dedhia, Emily Lo, Annie Nisnevich, Allison Gray, Holly Henry

TL;DR
This study evaluates how well different methods detect bot attacks in surveys, showing that most are ineffective and suggesting better strategies to protect survey data integrity.
Contribution
The paper introduces an epidemiological framework to assess fraud detection methods in surveys, revealing their limitations and proposing more effective strategies.
Findings
Most recommended methods for detecting bot attacks in surveys have low sensitivity and perform worse than chance.
Combinations of fraud markers and repeated response blocks are more effective at identifying bot attacks.
Failure to remove fraudulent records significantly alters survey results, especially in demographics and health care outcomes.
Abstract
Survey research has the potential to elevate the experiences and opinions of marginalized populations. The rising number of bot attacks, a method of participant fraud that creates multiple records in survey data using automated software, threatens to drown out those voices and produce inaccurate findings. Rapid identification and mitigation of bot attacks are vital; however, there is limited guidance for researchers on scalable approaches to address this problem. This study aimed to assess how well recommended methods detect fraud using an epidemiological diagnostic test framework to inform web-based survey researchers on how best to identify and shut down bot attacks. We analyzed data from a cross-sectional web-based statewide survey on access to pediatric subspecialty care in California that used Qualtrics survey software. Caregivers of children with chronic conditions were…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Methodology and Nonresponse · Imbalanced Data Classification Techniques · Statistical Methods in Epidemiology
