Auditory Intelligence: Understanding the World Through Sound
Hyeonuk Nam

TL;DR
This paper redefines auditory intelligence as a layered, contextual process involving perception, reasoning, and interaction, proposing four new task paradigms to advance generalizable and explainable sound understanding.
Contribution
It introduces four cognitively inspired task paradigms—ASPIRE, SODA, AUX, and AUGMENT—that structure auditory understanding in a comprehensive, layered framework.
Findings
Proposes a layered, situated model of auditory intelligence.
Introduces four new task paradigms for sound understanding.
Aims to enhance generalizability and explainability in auditory AI.
Abstract
Recent progress in auditory intelligence has yielded high-performing systems for sound event detection (SED), acoustic scene classification (ASC), automated audio captioning (AAC), and audio question answering (AQA). Yet these tasks remain largely constrained to surface-level recognition-capturing what happened but not why, what it implies, or how it unfolds in context. I propose a conceptual reframing of auditory intelligence as a layered, situated process that encompasses perception, reasoning, and interaction. To instantiate this view, I introduce four cognitively inspired task paradigms-ASPIRE, SODA, AUX, and AUGMENT-those structure auditory understanding across time-frequency pattern captioning, hierarchical event/scene description, causal explanation, and goal-driven interpretation, respectively. Together, these paradigms provide a roadmap toward more generalizable, explainable,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Neuroscience and Music Perception · Emotion and Mood Recognition
