Measuring Sound Symbolism in Audio-visual Models

Wei-Cheng Tseng; Yi-Jen Shih; David Harwath; Raymond Mooney

arXiv:2409.12306·cs.CL·November 13, 2024

Measuring Sound Symbolism in Audio-visual Models

Wei-Cheng Tseng, Yi-Jen Shih, David Harwath, Raymond Mooney

PDF

Open Access

TL;DR

This paper investigates whether pre-trained audio-visual models exhibit sound symbolism, revealing that models trained on speech data can capture sound-meaning associations similar to human language processing.

Contribution

It introduces a specialized dataset and a non-parametric evaluation method to assess sound symbolism in pre-trained audio-visual models, highlighting their ability to reflect human-like sound-meaning connections.

Findings

01

Models trained on speech data show significant sound symbolism patterns.

02

A new dataset with synthesized images and audio was developed for evaluation.

03

Pre-trained models can capture non-arbitrary sound-meaning associations.

Abstract

Audio-visual pre-trained models have gained substantial attention recently and demonstrated superior performance on various audio-visual tasks. This study investigates whether pre-trained audio-visual models demonstrate non-arbitrary associations between sounds and visual representations $\unicode x 2013$ known as sound symbolism $\unicode x 2013$ which is also observed in humans. We developed a specialized dataset with synthesized images and audio samples and assessed these models using a non-parametric approach in a zero-shot setting. Our findings reveal a significant correlation between the models' outputs and established patterns of sound symbolism, particularly in models trained on speech data. These results suggest that such models can capture sound-meaning connections akin to human language processing, providing insights into both cognitive architectures and machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Music and Audio Processing

MethodsSoftmax · Attention Is All You Need