Unsupervised Speech Representation Learning for Behavior Modeling using   Triplet Enhanced Contextualized Networks

Haoqi Li; Brian Baucom; Shrikanth Narayanan; Panayiotis Georgiou

arXiv:2104.03899·eess.AS·April 9, 2021

Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks

Haoqi Li, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

PDF

Open Access

TL;DR

This paper introduces an unsupervised learning framework using Triplet-Enhanced Contextualized Networks to extract behavioral information from speech, enabling cross-domain behavior modeling without manual annotations.

Contribution

It proposes a novel unsupervised representation learning method with a Triplet-Enhanced Deep Contextualized Network for behavior modeling from speech.

Findings

01

Effective in capturing behavioral context from speech

02

Generalizes across multiple domains including therapy and entertainment

03

Demonstrates promising results in unsupervised behavior recognition

Abstract

Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Human Pose and Action Recognition · Emotion and Mood Recognition