Acoustic Impulse Responses for Wearable Audio Devices
Ryan M. Corey, Naoki Tsuda, and Andrew C. Singer

TL;DR
This paper introduces a comprehensive dataset of over 8000 acoustic impulse responses from wearable microphones, enabling evaluation of audio systems and demonstrating the advantages of body-spread microphone arrays.
Contribution
It provides a large open-access dataset of wearable acoustic impulse responses and analyzes the acoustic transfer functions across different body locations and conditions.
Findings
Body-spread microphone arrays outperform single-device arrays.
Clothing affects microphone transfer functions.
Simulated beamformers improve noise reduction.
Abstract
We present an open-access dataset of over 8000 acoustic impulse from 160 microphones spread across the body and affixed to wearable accessories. The data can be used to evaluate audio capture and array processing systems using wearable devices such as hearing aids, headphones, eyeglasses, jewelry, and clothing. We analyze the acoustic transfer functions of different parts of the body, measure the effects of clothing worn over microphones, compare measurements from a live human subject to those from a mannequin, and simulate the noise-reduction performance of several beamformers. The results suggest that arrays of microphones spread across the body are more effective than those confined to a single device.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
ACOUSTIC IMPULSE RESPONSES FOR WEARABLE AUDIO DEVICES
Abstract
We present an open-access dataset of over 8000 acoustic impulse from 160 microphones spread across the body and affixed to wearable accessories. The data can be used to evaluate audio capture and array processing systems using wearable devices such as hearing aids, headphones, eyeglasses, jewelry, and clothing. We analyze the acoustic transfer functions of different parts of the body, measure the effects of clothing worn over microphones, compare measurements from a live human subject to those from a mannequin, and simulate the noise-reduction performance of several beamformers. The results suggest that arrays of microphones spread across the body are more effective than those confined to a single device.
**Index Terms— ** Acoustic impulse response, microphone arrays, wearables, audio enhancement, hearing aids
1 Introduction
Thanks to advances in transducer technology, such as tiny digital MEMS microphones [1], multiple audio sensors can be embedded in wearable devices such as watches, headphones, eyeglasses, and other accessories. These microphones could be combined to perform array processing such as beamforming, localization, and source separation [2, 3, 4]. A wearable array with many microphones spread over a wide area would offer greater spatial resolution than the small arrays embedded in most hearing aids, headsets, and mobile phones today. Wearable microphone arrays could dramatically improve performance in assistive listening [5, 6], augmented reality [7], and machine perception applications.
There have been several wearable array designs reported in the literature, including helmets [8, 9, 10], eyeglasses [11, 12], and vests [13, 14]. However, these designs have been restricted to small areas of the body and the literature offers little guidance about how microphone placement affects performance. Furthermore, there is little publicly available data, such as impulse response measurements, that can be used to design wearable arrays and test multimicrophone processing algorithms.
Multimicrophone impulse response datasets, such as [15, 16, 17], are used to simulate sound propagation and evaluate reverberant source separation and beamforming algorithms. There is abundant publicly available data on head-related transfer functions (HRTF), which characterize directional filtering by the ears [18]. HRTF datasets, such as [19, 20], usually only include responses at the ear canals and sometimes at hearing-aid earpieces [21]. To simulate and evaluate wearable audio systems, researchers could use impulse responses measured with microphones placed all across the body. Note that whereas HRTFs are often used in human perceptual applications—for example, to create virtual sources in a listener’s auditory environment [7]—these body-related transfer functions (BRTFs) are not directly related to human hearing. Rather, they help machines to localize, separate, and enhance real-world sound sources, and could be used alongside HRTFs in listening enhancement applications.
Here we present a new dataset [22] of acoustic impulse responses measured at 160 sensor positions across the body and various wearable accessories. Version 1 of the wearable microphone dataset contains about 8000 measurements with one human subject, one mannequin, five head-mounted accessories and six types of outerwear. The data and documentation is available through the Illinois Data Bank111https://doi.org/10.13012/B2IDB-1932389_V1, an open-access data archival service maintained by the University of Illinois at Urbana-Champaign.
The wearable microphone dataset can be used to characterize the acoustic effects of the body on wearable audio devices and to simulate microphone arrays for applications such as hearing aids, augmented reality, and human-computer interaction. In this paper, we analyze this data to describe the acoustic effects of different body parts, evaluate the mannequin as a human analogue, and compare the attenuation of different clothing worn over microphones. Finally, we use the dataset to assess designs of wearable microphone arrays for a beamforming application.
2 Impulse Response Measurements
The measurement setup is shown in Fig. 1. The impulse responses were measured in an acoustically treated recording space in the Illinois Augmented Listening Laboratory. Each half-second impulse response was computed from a ten-second linear sweep repeated three times from a studio monitor, captured by 16 Countryman B3 omnidirectional lavalier microphones, and digitized at 24 bits and 48 kHz by a Focusrite Scarlett audio interface. After each sequence of sweeps, the subject was rotated to capture impulse responses from a total of 24 source angles. The microphones were then moved to new positions and the measurements were repeated.
The human subject is 181 cm tall with a head circumference of 61 cm. The hollow plastic mannequin, designed for displaying clothing, is 183 cm tall with a 56 cm head circumference. Since the mannequin head has unnaturally small ears, a soft plastic replica ear was affixed to each side of the head. These replica ears are not intended to have realistic HRTFs, since HRTF data from realistic head simulators and real humans is already readily available.
The BRTF data includes 80 microphone positions on the body, shown in Fig. 2. One microphone was placed just outside of each ear canal and affixed using medical tape. These microphones capture approximate HRTFs and can be used to simulate binaural signal processing algorithms such as spatial-cue-preserving beamformers [23, 24]. Four microphones were mounted in a pair of custom-made behind-the-ear (BTE) shells similar to those used in many hearing aids. Ten were attached to a pair of eyeglasses and the remaining 64 microphones were clipped onto the subject’s clothing.
Since a wearable microphone array might be covered by clothing, the torso measurements were repeated with different outerwear including a t-shirt, cotton dress shirt, heavy cotton sweatshirt, polyester pullover, wool coat, and leather jacket.
These BRTF measurements are supplemented by impulse responses from wearable accessories. Since many previously reported wearable arrays are mounted on the head, measurements were collected using over-the-ear headphones, a baseball cap, a hard hat, a hat with a 40 cm flat brim, and a hat with a 60 cm curved brim, each with 16 microphones.
3 Acoustic Transfer Functions
3.1 Effects of the body
The acoustic effects of the head, which humans use to localize sound, have been well studied [18]. A microphone in the left ear will capture more energy from sources on the left than sources on the right, especially at high frequencies. This interaural level difference is shown in Fig. 3. The human head has a slightly stronger acoustic shadow effect than the plastic mannequin head. The head-shadow effect measured in the treated recording space is slightly weaker than fully-anechoic KEMAR data from [19].
The rest of the body has similar shadowing effects, which causes omnidirectional wearable microphones to have directional responses, as shown in Fig. 4. A microphone on the front of the chest receives about 8 dB less sound energy from sources behind the wearer. Microphones on the temple and shoulder are shadowed from the side but not from the front.
The body-related shadow effect varies with frequency and body part. For both the human and mannequin, the shadow effect was strongest for the the upper chest and weakest for the forehead, although the differences between body parts are small compared to variations across frequency. Fig. 5 shows the average difference in transfer function magnitude between the sources nearest to and farthest from each microphone on the upper chest and forehead. The transfer functions for the human and mannequin are similar in magnitude, suggesting that inexpensive plastic mannequins can be used as human analogues in wearable-microphone experiments.
3.2 Effects of clothing
In many wearable-audio applications, microphones might be worn in, on, or under clothing. In the HRTF literature, it has been shown that hair, eyeglasses, and hats have small but measurable effects on acoustic transfer functions to the ear [25, 26, 27] but do not significantly affect human localization performance [26, 28]. The strongest effects are from curly hairstyles that cover the pinna and wide-brimmed hats that reflect sounds from below into the ear and sounds from above away from the ear [26]. Clothing worn on the torso has little effect on HRTFs—at most, it changes the strength of multipath reflections from sources below the listener [26]—but would of course have a strong effect on BRTFs.
The attenuation due to different clothing, averaged over all microphones on the torso, is shown in Fig. 6. All garments attenuate higher frequencies, but the degree of attenuation depends on the type of clothing. The t-shirt has the smallest effect, up to 5 dB at 20 kHz. The light cotton dress shirt, heavy cotton sweatshirt, and polyester pullover have nearly identical attenuation effects. The wool coat and leather jacket have strong high-frequency attenuation, suggesting that wearable audio devices might be less useful when covered by heavy outerwear. Note that the leather jacket appears to slightly amplify sound around 200–600 Hz in this recording setup; the effect was consistent across all microphones.
4 Application to Beamforming
Microphone arrays are often used for beamforming, that is, to isolate a desired source and remove unwanted noise [29, 5, 3]. A wearable array with many microphones spread across the body could perform stronger noise reduction than the small arrays included in many audio devices today. The wearable microphone dataset developed here can be used to study how performance scales with array size in a wearable application and how such arrays should be designed.
4.1 MVDR beamformer
Let be a sequence of speech samples emitted from a nonmoving source of interest. Let be an -dimensional impulse response from the source to each of microphones in an array. Let be an unwanted noise sequence. Assuming linear time-invariant propagation, the sampled recorded signal is
[TABLE]
In the frequency domain, (1) can be written
[TABLE]
where is the discrete-time acoustic transfer function vector and , , and are discrete-time Fourier transforms of the corresponding sequences.
If is a wide-sense stationary random process with power spectral density , then the output of a minimum-variance distortionless-response (MVDR) beamformer is given in the frequency domain by
[TABLE]
This beamformer minimizes noise power subject to the constraint that the output due to the target source has unity gain with respect to microphone 1, which is near the left ear. In a binaural system, there would be a second output with unity gain with respect to the right-ear microphone. This constraint ensures that the target source sounds natural to the listener, although any residual noise will be spatially and spectrally distorted [23].
The performance metric used in these experiments is the improvement in signal-to-noise ratio (SNR) between input and output:
[TABLE]
where is the noise-free desired sequence.
4.2 Beamforming simulation
An MVDR beamformer was simulated using several wearable array configurations with different numbers of microphones. For each of 100 trials, a target source and five interference sources were randomly placed at six of the 24 possible source locations. The source data was also randomly chosen from a set of ten-second anechoic speech clips from the VCTK corpus [30]. Since the source impulse responses are known, an MVDR beamformer with more than six inputs could achieve near-perfect performance by placing a null over each source. To prevent this overfitting, the beamformer was designed using 32 ms windowed impulse responses and diagonal loading about 10 dB below the average speech power.
The results of the beamforming experiment for different numbers of microphones are shown in Fig. 7. Performance improves rapidly with the first few sensors as each new input allows the beamformer to cancel an additional source. Larger arrays offer more marginal improvements, helping to reduce residual noise and compensate for transfer-function mismatch. The locations of the microphones also affect performance: notice that the 18 microphones on the ear canals and torso outperform 32 microphones on the head. The microphones on the head are closely spaced, while those on the torso are widely separated and also more strongly shadowed by the body.
Fig. 8 shows the performance of several arrays with microphones, two of which are the left and right-ear reference microphones. Comparing different head-mounted accessories, the largest hat provides the best beamforming gain because of its spatial diversity. The microphones attached to the over-the-ear headphones are too closely spaced to provide much benefit at low frequencies and do not experience a strong shadowing effect at high frequencies. The 60 cm hat is about as effective as the lower-body array, which covers the largest area among the clothing-based arrays.
5 Conclusions
Many audio products, especially wearable devices such as hearing aids and headsets, use relatively few microphones that are closely spaced. The beamforming simulation suggests that performance could be improved by using many microphones spread across the body. For example, an array of 18 microphones across the torso reduced noise by an average of about 2 dB more than an array of 18 microphones spaced across headphones. It also outperformed an array of nearly twice as many microphones covering the head alone! The experiments with clothing suggest that wearable microphones remain useful even when covered by heavy shirts and sweaters, though wind-blocking coats and jackets cause significant attenuation.
Further work is required to understand how acoustic transfer functions vary between individuals. The wearable microphone dataset could be expanded in the future to include more human subjects and wearable devices. This data will allow researchers to simulate and compare different wearable array designs and to develop new signal processing methods that take advantage of larger arrays than are typically used today.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. P. Zwyssig, “Speech processing using digital MEMS microphones,” Ph.D. dissertation, The University of Edinburgh, 2013.
- 2[2] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications . Springer, 2013.
- 3[3] S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, “A consolidated perspective on multimicrophone speech enhancement and source separation,” IEEE Transactions on Audio, Speech, and Language Processing , vol. 25, no. 4, pp. 692–730, 2017.
- 4[4] E. Vincent, T. Virtanen, and S. Gannot, Audio Source Separation and Speech Enhancement . Wiley, 2018.
- 5[5] S. Doclo, S. Gannot, M. Moonen, and A. Spriet, “Acoustic beamforming for hearing aid applications,” in Handbook on Array Processing and Sensor Networks , S. Haykin and K. R. Liu, Eds. Wiley, 2008, pp. 269–302.
- 6[6] S. Doclo, W. Kellermann, S. Makino, and S. E. Nordholm, “Multichannel signal enhancement algorithms for assisted listening devices,” IEEE Signal Processing Magazine , vol. 32, no. 2, pp. 18–30, 2015.
- 7[7] V. Valimaki, A. Franck, J. Ramo, H. Gamper, and L. Savioja, “Assisted listening using a headset: Enhancing audio perception in real, augmented, and virtual environments,” IEEE Signal Processing Magazine , vol. 32, no. 2, pp. 92–99, 2015.
- 8[8] M. V. Scanlon, “Helmet-mounted acoustic array for hostile fire detection and localization in an urban environment,” in Unattended Ground, Sea, and Air Sensor Technologies and Applications , vol. 6963. International Society for Optics and Photonics, 2008, p. 69630 D.
