Automatic Organisation, Segmentation, and Filtering of User-Generated Audio Content
Gon\c{c}alo Mordido, Jo\~ao Magalh\~aes, Sofia Cavaco

TL;DR
This paper introduces methods for organizing, segmenting, and filtering large datasets of user-generated audio content using audio fingerprinting and supervised learning, validated on concert recordings from YouTube.
Contribution
It presents novel techniques for grouping and analyzing user-generated audio files based solely on fingerprinting data, including error detection with supervised learning.
Findings
Effective grouping of audio files from large datasets
Supervised learning reduces incorrect fingerprint matches
Validated methods on YouTube concert recordings
Abstract
Using solely the information retrieved by audio fingerprinting techniques, we propose methods to treat a possibly large dataset of user-generated audio content, that (1) enable the grouping of several audio files that contain a common audio excerpt (i.e., are relative to the same event), and (2) give information about how those files are correlated in terms of time and quality inside each event. Furthermore, we use supervised learning to detect incorrect matches that may arise from the audio fingerprinting algorithm itself, whilst ensuring our model learns with previous predictions. All the presented methods were further validated by user-generated recordings of several different concerts manually crawled from YouTube.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
