Predicting gender and age categories in English conversations using lexical, non-lexical, and turn-taking features
Andreas Liesenfeld, G\'abor Parti, Yu-Yin Hsu, Chu-Ren Huang

TL;DR
This study analyzes British English conversations to identify linguistic and turn-taking features that distinguish gender and age groups, and uses these features to predict speaker categories with notable accuracy.
Contribution
It introduces a novel approach combining lexical, non-lexical, and turn-taking features for predicting gender and age in conversational data.
Findings
Female speakers produce more and longer turns.
Male speakers use more minimal particles like 'uh' and 'em'.
Young speakers use more swear words and laughter.
Abstract
This paper examines gender and age salience and (stereo)typicality in British English talk with the aim to predict gender and age categories based on lexical, phrasal and turn-taking features. We examine the SpokenBNC, a corpus of around 11.4 million words of British English conversations and identify behavioural differences between speakers that are labelled for gender and age categories. We explore differences in language use and turn-taking dynamics and identify a range of characteristics that set the categories apart. We find that female speakers tend to produce more and slightly longer turns, while turns by male speakers feature a higher type-token ratio and a distinct range of minimal particles such as "eh", "uh" and "em". Across age groups, we observe, for instance, that swear words and laughter characterize young speakers' talk, while old speakers tend to produce more truncated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Discourse, Communication Strategies · Swearing, Euphemism, Multilingualism · Gender Studies in Language
