TL;DR
This paper introduces a point cloud-based approach to audio processing that achieves invariance to input representation parameters, enabling flexible, smaller models with minimal performance loss across different sampling rates and representations.
Contribution
The authors propose a novel point cloud method for audio processing that is invariant to representation choices and allows for effective subsampling, unlike traditional fixed-dimensional models.
Findings
Models are smaller and more efficient.
Performance remains stable despite subsampling.
Invariance to DFT size and sampling rate.
Abstract
Most audio processing pipelines involve transformations that act on fixed-dimensional input representations of audio. For example, when using the Short Time Fourier Transform (STFT) the DFT size specifies a fixed dimension for the input representation. As a consequence, most audio machine learning models are designed to process fixed-size vector inputs which often prohibits the repurposing of learned models on audio with different sampling rates or alternative representations. We note, however, that the intrinsic spectral information in the audio signal is invariant to the choice of the input representation or the sampling rate. Motivated by this, we introduce a novel way of processing audio signals by treating them as a collection of points in feature space, and we use point cloud machine learning models that give us invariance to the choice of representation parameters, such as DFT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
