Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh; Shuo Han; Hazim Bukhari; Benjamin Elizalde; Hannes; Gamper; Rita Singh; Bhiksha Raj

arXiv:2407.18062·cs.SD·July 26, 2024

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

Soham Deshmukh, Shuo Han, Hazim Bukhari, Benjamin Elizalde, Hannes, Gamper, Rita Singh, Bhiksha Raj

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces the task of Audio Entailment to evaluate the deductive reasoning ability of Audio-Language Models, revealing their limitations and proposing a captioning-based intermediate step to improve reasoning performance.

Contribution

The paper defines a new benchmark for logical reasoning in audio understanding and demonstrates how captioning can enhance ALMs' reasoning capabilities.

Findings

01

ALMs show deficiencies in logical reasoning tasks.

02

Caption-before-reason improves reasoning performance.

03

Benchmark datasets reveal reasoning limitations in current models.

Abstract

Recent literature uses language to build foundation models for audio. These Audio-Language Models (ALMs) are trained on a vast number of audio-text pairs and show remarkable performance in tasks including Text-to-Audio Retrieval, Captioning, and Question Answering. However, their ability to engage in more complex open-ended tasks, like Interactive Question-Answering, requires proficiency in logical reasoning -- a skill not yet benchmarked. We introduce the novel task of Audio Entailment to evaluate an ALM's deductive reasoning ability. This task assesses whether a text description (hypothesis) of audio content can be deduced from an audio recording (premise), with potential conclusions being entailment, neutral, or contradiction, depending on the sufficiency of the evidence. We create two datasets for this task with audio recordings sourced from two audio captioning datasets --…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Audio Entailment: Assessing Deductive Reasoning for Audio Understanding· underline

Taxonomy

TopicsSubtitles and Audiovisual Media