Synthetic Audio Helps for Cognitive State Tasks

Adil Soubki; John Murzaku; Peter Zeng; Owen Rambow

arXiv:2502.06922·cs.SD·February 12, 2025

Synthetic Audio Helps for Cognitive State Tasks

Adil Soubki, John Murzaku, Peter Zeng, Owen Rambow

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Synthetic Audio Data fine-tuning (SAD), a framework that enhances cognitive state modeling tasks by incorporating synthetic audio generated from TTS systems, improving performance over text-only methods.

Contribution

The paper presents a novel multimodal training framework using synthetic audio to improve cognitive state task performance, demonstrating benefits over text-only approaches.

Findings

01

Synthetic audio improves cognitive state task accuracy.

02

SAD achieves competitive results with gold audio data.

03

Multimodal training enhances model robustness.

Abstract

The NLP community has broadly focused on text-only approaches of cognitive state tasks, but audio can provide vital missing cues through prosody. We posit that text-to-speech models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that language models exploit. We present Synthetic Audio Data fine-tuning (SAD), a framework where we show that 7 tasks related to cognitive state modeling benefit from multimodal training on both text and zero-shot synthetic audio data from an off-the-shelf TTS system. We show an improvement over the text-only modality when adding synthetic audio data to text-only corpora. Furthermore, on tasks and corpora that do contain gold audio, we show our SAD framework achieves competitive performance with text and synthetic audio compared to text and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adil-soubki/sad-training
noneOfficial

Videos

Synthetic Audio Helps for Cognitive State Tasks· underline

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Speech and dialogue systems · Visual and Cognitive Learning Processes