Segment Length Matters: A Study of Segment Lengths on Audio Fingerprinting Performance

Ziling Gong; Yunyan Ouyang; Iram Kamdar; Melody Ma; Hongjie Chen; Franck Dernoncourt; Ryan A. Rossi; Nesreen K. Ahmed

arXiv:2601.17690·cs.SD·January 27, 2026

Segment Length Matters: A Study of Segment Lengths on Audio Fingerprinting Performance

Ziling Gong, Yunyan Ouyang, Iram Kamdar, Melody Ma, Hongjie Chen, Franck Dernoncourt, Ryan A. Rossi, Nesreen K. Ahmed

PDF

Open Access

TL;DR

This study investigates how the length of audio segments influences the accuracy of neural audio fingerprinting, revealing that shorter segments often yield better retrieval performance and demonstrating the utility of LLMs in recommending optimal segment durations.

Contribution

The paper systematically analyzes the impact of segment length on fingerprinting accuracy and introduces LLM-based recommendations for optimal segment durations in audio retrieval systems.

Findings

01

Short segments (0.5s) generally outperform longer ones.

02

GPT-5-mini provides the most accurate segment length recommendations.

03

Practical guidance for large-scale neural audio retrieval systems.

Abstract

Audio fingerprinting provides an identifiable representation of acoustic signals, which can be later used for identification and retrieval systems. To obtain a discriminative representation, the input audio is usually segmented into shorter time intervals, allowing local acoustic features to be extracted and analyzed. Modern neural approaches typically operate on short, fixed-duration audio segments, yet the choice of segment duration is often made heuristically and rarely examined in depth. In this paper, we study how segment length affects audio fingerprinting performance. We extend an existing neural fingerprinting architecture to adopt various segment lengths and evaluate retrieval accuracy across different segment lengths and query durations. Our results show that short segment lengths (0.5-second) generally achieve better performance. Moreover, we evaluate LLM capacity in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Animal Vocal Communication and Behavior