Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks
Haoqi Li, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

TL;DR
This paper introduces an unsupervised learning framework using Triplet-Enhanced Contextualized Networks to extract behavioral information from speech, enabling cross-domain behavior modeling without manual annotations.
Contribution
It proposes a novel unsupervised representation learning method with a Triplet-Enhanced Deep Contextualized Network for behavior modeling from speech.
Findings
Effective in capturing behavioral context from speech
Generalizes across multiple domains including therapy and entertainment
Demonstrates promising results in unsupervised behavior recognition
Abstract
Speech encodes a wealth of information related to human behavior and has been used in a variety of automated behavior recognition tasks. However, extracting behavioral information from speech remains challenging including due to inadequate training data resources stemming from the often low occurrence frequencies of specific behavioral patterns. Moreover, supervised behavioral modeling typically relies on domain-specific construct definitions and corresponding manually-annotated data, rendering generalizing across domains challenging. In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way. We hypothesize that nearby segments of speech share the same behavioral context and hence map onto similar underlying behavioral representations. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Human Pose and Action Recognition · Emotion and Mood Recognition
