An Initial study on Birdsong Re-synthesis Using Neural Vocoders
Rhythm Bhatia, Tomi H. Kinnunen

TL;DR
This study compares traditional and neural vocoders for birdsong resynthesis, revealing similar species discrimination but differences in perceived bird-like qualities, highlighting challenges in processing wildlife audio.
Contribution
First comparative analysis of birdsong resynthesis using traditional and neural vocoders, identifying strengths and limitations in bio-acoustic applications.
Findings
No difference in species discrimination across vocoders
WORLD vocoder samples rated higher for bird-like qualities
All vocoders faced pitch and voicing challenges
Abstract
Modern speech synthesis uses neural vocoders to model raw waveform samples directly. This increased versatility has expanded the scope of vocoders from speech to other domains, such as music. We address another interesting domain of bio-acoustics. We provide initial comparative analysis-resynthesis experiments of birdsong using traditional (WORLD) and two neural (WaveNet autoencoder, parallel WaveGAN) vocoders. Our subjective results indicate no difference in the three vocoders in terms of species discrimination (ABX test). Nonetheless, the WORLD vocoder samples were rated higher in terms of retaining bird-like qualities (MOS test). All vocoders faced issues with pitch and voicing. Our results indicate some of the challenges in processing low-quality wildlife audio data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Vocal Communication and Behavior · Music and Audio Processing · Diverse Musicological Studies
