Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability
Venelin Kovatchev, Phillip Smith, Mark Lee, and Rory Devine

TL;DR
This study compares seven data augmentation strategies for automatically scoring children's mindreading ability, demonstrating that task-specific augmentations improve performance and generalization, with vector-based methods performing poorly.
Contribution
The paper introduces and evaluates multiple data augmentation strategies for mindreading assessment, establishing a new state-of-the-art and providing insights into their effectiveness.
Findings
Task-specific augmentations outperform task-agnostic ones.
Automatic vector-based augmentations perform poorly.
Augmentation improves generalization to unseen data.
Abstract
In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or "mindreading"). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
