# Sound Search by Text Description or Vocal Imitation?

**Authors:** Yichi Zhang, Yiting Zhang, Zhiyao Duan

arXiv: 1907.08661 · 2019-07-23

## TL;DR

This study compares vocal imitation and text-based sound search methods through user experiments, finding vocal imitation often yields higher satisfaction and ease-of-use, especially for sounds hard to describe textually.

## Contribution

It introduces a pilot study with web-based search engines for sound, evaluating user preferences between vocal imitation and text descriptions.

## Key findings

- Vocal imitation search received higher satisfaction ratings.
- Vocal imitation was easier to use for complex sounds.
- Vocal imitation outperformed text search for difficult-to-describe sounds.

## Abstract

Searching sounds by text labels is often difficult, as text descriptions cannot describe the audio content in detail. Query by vocal imitation bridges such gap and provides a novel way to sound search. Several algorithms for sound search by vocal imitation have been proposed and evaluated in a simulation environment, however, they have not been deployed into a real search engine nor evaluated by real users. This pilot work conducts a subjective study to compare these two approaches to sound search, and tries to answer the question of which approach works better for what kinds of sounds. To do so, we developed two web-based search engines for sound, one by vocal imitation (Vroom!) and the other by text description (TextSearch). We also developed an experimental framework to host these engines to collect statistics of user behaviors and ratings. Results showed that Vroom! received significantly higher search satisfaction ratings than TextSearch did for sound categories that were difficult for subjects to describe by text. Results also showed a better overall ease-of-use rating for Vroom! than TextSearch on the limited sound library in our experiments. These findings suggest advantages of vocal-imitation-based search for sound in practice.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.08661/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1907.08661/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1907.08661/full.md

---
Source: https://tomesphere.com/paper/1907.08661