GraFPrint: A GNN-Based Approach for Audio Identification

Aditya Bhattacharjee; Shubhr Singh; Emmanouil Benetos

arXiv:2410.10994·cs.SD·January 27, 2025

GraFPrint: A GNN-Based Approach for Audio Identification

Aditya Bhattacharjee, Shubhr Singh, Emmanouil Benetos

PDF

Open Access 1 Repo

TL;DR

GraFPrint is a novel GNN-based framework that creates robust audio fingerprints by leveraging graph structures and self-supervised learning, showing superior scalability and resilience to distortions in large-scale datasets.

Contribution

It introduces GraFPrint, a new GNN-based audio identification method that constructs k-NN graphs and uses contrastive training for improved robustness and scalability.

Findings

01

Outperforms existing methods on large-scale datasets

02

Resilient to ambient distortions due to contrastive training

03

Lightweight and scalable for real-world applications

Abstract

This paper introduces GraFPrint, an audio identification framework that leverages the structural learning capabilities of Graph Neural Networks (GNNs) to create robust audio fingerprints. Our method constructs a k-nearest neighbor (k-NN) graph from time-frequency representations and applies max-relative graph convolutions to encode local and global information. The network is trained using a self-supervised contrastive approach, which enhances resilience to ambient distortions by optimizing feature representation. GraFPrint demonstrates superior performance on large-scale datasets at various levels of granularity, proving to be both lightweight and scalable, making it suitable for real-world applications with extensive reference databases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chymaera96/GraFP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing