Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation

Timo K. Koch; Florian Bemmann; Ramona Schoedel; Markus Buehner; Clemens Stachl

arXiv:2603.17061·cs.HC·March 19, 2026

Collecting Prosody in the Wild: A Content-Controlled, Privacy-First Smartphone Protocol and Empirical Evaluation

Timo K. Koch, Florian Bemmann, Ramona Schoedel, Markus Buehner, Clemens Stachl

PDF

Open Access

TL;DR

This paper presents a privacy-preserving smartphone protocol for collecting prosodic speech data using scripted sentences, enabling large-scale, standardized, and ethically compliant analysis of natural speech variations.

Contribution

It introduces a novel content-controlled, privacy-first protocol with on-device feature extraction and demonstrates its effectiveness through a large-scale empirical study.

Findings

01

High participant compliance and data quality

02

Successful prediction of speaker sex from prosodic features

03

Effective concurrent prediction of affective states

Abstract

Collecting everyday speech data for prosodic analysis is challenging due to the confounding of prosody and semantics, privacy constraints, and participant compliance. We introduce and empirically evaluate a content-controlled, privacy-first smartphone protocol that uses scripted read-aloud sentences to standardize lexical content (including prompt valence) while capturing natural variation in prosodic delivery. The protocol performs on-device prosodic feature extraction, deletes raw audio immediately, and transmits only derived features for analysis. We deployed the protocol in a large study (N = 560; 9,877 recordings), evaluated compliance and data quality, and conducted diagnostic prediction tasks on the extracted features, predicting speaker sex and concurrently reported momentary affective states (valence, arousal). We discuss implications and directions for advancing and deploying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Phonetics and Phonology Research · Voice and Speech Disorders