Who Is Missing? Characterizing the Participation of Different   Demographic Groups in a Korean Nationwide Daily Conversation Corpus

Haewoon Kwak; Jisun An; Kunwoo Park

arXiv:2204.09209·cs.CL·April 21, 2022

Who Is Missing? Characterizing the Participation of Different Demographic Groups in a Korean Nationwide Daily Conversation Corpus

Haewoon Kwak, Jisun An, Kunwoo Park

PDF

Open Access

TL;DR

This paper analyzes a Korean nationwide daily conversation corpus to understand how different demographic groups, such as age and sex, participate, addressing the lack of demographic data in conversational AI resources.

Contribution

It provides the first detailed demographic analysis of a large Korean conversation corpus, highlighting participation disparities among age and sex groups.

Findings

01

Demographic participation varies significantly across age groups.

02

Women and younger participants are more represented in the corpus.

03

Insights can inform more inclusive conversational AI development.

Abstract

A conversation corpus is essential to build interactive AI applications. However, the demographic information of the participants in such corpora is largely underexplored mainly due to the lack of individual data in many corpora. In this work, we analyze a Korean nationwide daily conversation corpus constructed by the National Institute of Korean Language (NIKL) to characterize the participation of different demographic (age and sex) groups in the corpus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems