Separating Content from Speaker Identity in Speech for the Assessment of   Cognitive Impairments

Dongseok Heo; Cheul Young Park; Jaemin Cheun; Myung Jin Ko

arXiv:2203.10827·eess.AS·March 22, 2022

Separating Content from Speaker Identity in Speech for the Assessment of Cognitive Impairments

Dongseok Heo, Cheul Young Park, Jaemin Cheun, Myung Jin Ko

PDF

Open Access

TL;DR

This paper investigates whether separating content from speaker identity in speech improves cognitive impairment assessment, finding content embeddings more effective but dependent on speaker embedding information.

Contribution

It introduces a framework for separating content from speaker identity in speech and evaluates its effectiveness for cognitive impairment assessment.

Findings

01

Content embeddings outperform speaker embeddings in assessment accuracy.

02

Effectiveness of content embeddings depends on the information encoded in speaker embeddings.

03

Simple classifiers using content embeddings show promising results on DementiaBank Pitt Corpus.

Abstract

Deep speaker embeddings have been shown effective for assessing cognitive impairments aside from their original purpose of speaker verification. However, the research found that speaker embeddings encode speaker identity and an array of information, including speaker demographics, such as sex and age, and speech contents to an extent, which are known confounders in the assessment of cognitive impairments. In this paper, we hypothesize that content information separated from speaker identity using a framework for voice conversion is more effective for assessing cognitive impairments and train simple classifiers for the comparative analysis on the DementiaBank Pitt Corpus. Our results show that while content embeddings have an advantage over speaker embeddings for the defined problem, further experiments show their effectiveness depends on information encoded in speaker embeddings due to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling