TL;DR
This study compares various machine learning algorithms, including random forests and neural networks, for automating autism spectrum disorder surveillance based on evaluation texts, finding that random forests perform as well as newer models and provide accurate prevalence estimates.
Contribution
It evaluates and compares multiple supervised learning algorithms for ASD surveillance, highlighting the effectiveness of random forests in prevalence estimation.
Findings
Random forests and NB-SVM achieved over 87% accuracy.
Random forests provided prevalence estimates close to true prevalence.
NB-SVM had more false negatives, limiting its surveillance utility.
Abstract
The Centers for Disease Control and Prevention (CDC) coordinates a labor-intensive process to measure the prevalence of autism spectrum disorder (ASD) among children in the United States. Random forests methods have shown promise in speeding up this process, but they lag behind human classification accuracy by about 5%. We explore whether more recently available document classification algorithms can close this gap. We applied 8 supervised learning algorithms to predict whether children meet the case definition for ASD based solely on the words in their evaluations. We compared the algorithms' performance across 10 random train-test splits of the data, using classification accuracy, F1 score, and number of positive calls to evaluate their potential use for surveillance. Across the 10 train-test cycles, the random forest and support vector machine with Naive Bayes features (NB-SVM) each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
