Combolutional Neural Networks
Cameron Churchwell, Minje Kim, Paris Smaragdis

TL;DR
This paper introduces the combolutional layer, a novel learned-delay IIR comb filter for extracting harmonic features in audio, demonstrating its effectiveness and efficiency across various audio retrieval tasks.
Contribution
The paper presents the combolutional layer as a new harmonic feature extractor, offering advantages over traditional convolutional layers in audio processing.
Findings
Effective in piano transcription, speaker classification, and key detection
Low parameter count and efficient CPU inference
Improved interpretability over existing frontends
Abstract
Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
