TL;DR
The paper presents a novel joint time-frequency scattering transform that effectively captures complex audio features, achieving state-of-the-art results in audio classification tasks like phone segmentation.
Contribution
It introduces a new joint time-frequency scattering transform that enhances audio feature representation for classification.
Findings
Successfully characterizes complex time-frequency phenomena
Achieves state-of-the-art results on TIMIT dataset
Effective for signal reconstruction and phone segmentation
Abstract
We introduce the joint time-frequency scattering transform, a time shift invariant descriptor of time-frequency structure for audio classification. It is obtained by applying a two-dimensional wavelet transform in time and log-frequency to a time-frequency wavelet scalogram. We show that this descriptor successfully characterizes complex time-frequency phenomena such as time-varying filters and frequency modulated excitations. State-of-the-art results are achieved for signal reconstruction and phone segment classification on the TIMIT dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
