Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification
Artem Abzaliev, Humberto P\'erez Espinosa, Rada Mihalcea

TL;DR
This paper explores using self-supervised speech models trained on human speech to improve dog bark classification across recognition, breed, gender, and context tasks, showing significant performance gains.
Contribution
It introduces the novel approach of applying human speech processing models to animal vocalization classification, demonstrating their effectiveness.
Findings
Speech embeddings outperform simple baselines
Pre-trained human speech models enhance classification accuracy
Models generalize well across multiple dog-related tasks
Abstract
Similar to humans, animals make extensive use of verbal and non-verbal forms of communication, including a large range of audio signals. In this paper, we address dog vocalizations and explore the use of self-supervised speech representation models pre-trained on human speech to address dog bark classification tasks that find parallels in human-centered tasks in speech recognition. We specifically address four tasks: dog recognition, breed identification, gender classification, and context grounding. We show that using speech embedding representations significantly improves over simpler classification baselines. Further, we also find that models pre-trained on large human speech acoustics can provide additional performance boosts on several tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Handwritten Text Recognition Techniques · Face recognition and analysis
