Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language   Model

Ziyang Ma; Zhuo Chen; Yuping Wang; Eng Siong Chng; Xie Chen

arXiv:2501.07246·cs.SD·January 14, 2025

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model

Ziyang Ma, Zhuo Chen, Yuping Wang, Eng Siong Chng, Xie Chen

PDF

3 Models

TL;DR

This paper investigates integrating Chain-of-Thought reasoning into Large Audio-Language Models to improve their complex reasoning abilities across audio tasks, revealing both benefits and limitations of current methods.

Contribution

First exploration of Chain-of-Thought reasoning in Large Audio-Language Models, analyzing its impact and challenges across auditory perception and understanding tasks.

Findings

01

CoT methods improve performance on easy and medium tasks

02

Reasoning chain length positively correlates with accuracy

03

Challenges remain in hard tasks where reasoning can cause confusion

Abstract

Large Audio-Language Models (LALMs) have demonstrated remarkable performance in tasks involving audio perception and understanding, such as speech recognition and audio captioning. However, their reasoning capabilities - critical for solving complex real-world problems - remain underexplored. In this work, we conduct the first exploration into integrating Chain-of-Thought (CoT) reasoning into LALMs to enhance their reasoning ability across auditory modalities. We evaluate representative CoT methods, analyzing their performance in both information extraction and reasoning tasks across sound, music, and speech domains. Our findings reveal that CoT methods significantly improve performance on easy and medium tasks but encounter challenges with hard tasks, where reasoning chains can confuse the model rather than improve accuracy. Additionally, we identify a positive correlation between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.