Learning Mid-Level Auditory Codes from Natural Sound Statistics
Wiktor M{\l}ynarski, Josh H. McDermott

TL;DR
This paper introduces a hierarchical generative model that learns complex mid-level auditory features from natural sounds, providing insights into possible neuronal computations in the auditory cortex.
Contribution
It presents a novel hierarchical model that captures mid-level auditory representations by learning combinations of spectrotemporal features from natural sound statistics.
Findings
Second-layer units encode complex acoustic patterns.
Some units group spectrotemporal features occurring together.
Others show opponency between dissimilar features.
Abstract
Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. Although models exist in the visual domain to explain how mid-level features such as junctions and curves might be derived from oriented filters in early visual cortex, little is known about analogous grouping principles for mid-level auditory representations. We propose a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · Neuroscience and Music Perception · Hearing Loss and Rehabilitation
