An approach to hummed-tune and song sequences matching
Loc Bao Pham, Huong Hoang Luong, Phu Thien Tran, Phuc Hoang Ngo, Vi, Hoang Nguyen, Thinh Nguyen

TL;DR
This paper presents a machine learning approach for humming tune recognition, utilizing deep neural networks and efficient search techniques, achieving high accuracy in song retrieval from hummed inputs.
Contribution
It introduces a complete pipeline for humming-based song matching, including data preprocessing, embedding model training, and fast inference with Faiss, demonstrating state-of-the-art performance.
Findings
94% MRR@10 on public test set
Top 1 result on public leaderboard
Effective use of ResNet, VGG, AlexNet, MobileNetV2
Abstract
Melody stuck in your head, also known as "earworm", is tough to get rid of, unless you listen to it again or sing it out loud. But what if you can not find the name of that song? It must be an intolerable feeling. Recognizing a song name base on humming sound is not an easy task for a human being and should be done by machines. However, there is no research paper published about hum tune recognition. Adapting from Hum2Song Zalo AI Challenge 2021 - a competition about querying the name of a song by user's giving humming tune, which is similar to Google's Hum to Search. This paper covers details about the pre-processed data from the original type (mp3) to usable form for training and inference. In training an embedding model for the feature extraction phase, we ran experiments with some states of the art, such as ResNet, VGG, AlexNet, MobileNetV2. And for the inference phase, we use the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDense Connections · Depthwise Convolution · Softmax · Pointwise Convolution · Depthwise Separable Convolution · Dropout · Batch Normalization · 1x1 Convolution · Max Pooling · Inverted Residual Block
