PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification
Lazaros Vrysis, Iordanis Thoidis, Charalampos Dimoulas, and George, Papanikolaou

TL;DR
This paper introduces PERSA+, a deep learning front-end designed to filter out irrelevant information from raw audio data, aiming to improve robustness and generalization in real-world audio classification tasks.
Contribution
The work proposes a novel preprocessing front-end that enhances deep learning models' ability to perform reliably across diverse and unpredictable audio environments.
Findings
Improved robustness in real-world audio classification
Enhanced generalization across different audio contexts
Reduced detrimental information before modeling
Abstract
Deep learning has been applied to diverse audio semantics tasks, enabling the construction of models that learn hierarchical levels of features from high-dimensional raw data, delivering state-of-the-art performance. But do these algorithms perform similarly in real-world conditions, or just at the benchmark, where their high learning capability assures the complete memorization of the employed datasets? This work presents a deep learning front-end, aiming at discarding detrimental information before entering the modeling stage, bringing the learning process closer to the point, anticipating the development of robust and context-agnostic classification algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
