Audio Fingerprinting with Holographic Reduced Representations
Yusuke Fujita, Tatsuya Komatsu

TL;DR
This paper introduces an audio fingerprinting approach using holographic reduced representations to significantly reduce stored fingerprints while maintaining high accuracy and time resolution.
Contribution
It presents a novel HRR-based aggregation method for audio fingerprints that reduces storage needs without sacrificing time resolution or accuracy.
Findings
Reduces number of fingerprints needed for high accuracy
Maintains time resolution with fewer fingerprints
Outperforms simple aggregation methods
Abstract
This paper proposes an audio fingerprinting model with holographic reduced representation (HRR). The proposed method reduces the number of stored fingerprints, whereas conventional neural audio fingerprinting requires many fingerprints for each audio track to achieve high accuracy and time resolution. We utilize HRR to aggregate multiple fingerprints into a composite fingerprint via circular convolution and summation, resulting in fewer fingerprints with the same dimensional space as the original. Our search method efficiently finds a combined fingerprint in which a query fingerprint exists. Using HRR's inverse operation, it can recover the relative position within a combined fingerprint, retaining the original time resolution. Experiments show that our method can reduce the number of fingerprints with modest accuracy degradation while maintaining the time resolution, outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection
MethodsHolographic Reduced Representation · Convolution
