Pre-Trained Foundation Model representations to uncover Breathing   patterns in Speech

Vikramjit Mitra; Anirban Chatterjee; Ke Zhai; Helen Weng; Ayuko Hill,; Nicole Hay; Christopher Webb; Jamie Cheng; Erdrin Azemi

arXiv:2407.13035·cs.SD·July 19, 2024·1 cites

Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech

Vikramjit Mitra, Anirban Chatterjee, Ke Zhai, Helen Weng, Ayuko Hill,, Nicole Hay, Christopher Webb, Jamie Cheng, Erdrin Azemi

PDF

Open Access

TL;DR

This study leverages pre-trained speech representations from foundation models like Wav2Vec2 to accurately estimate breathing rate from speech signals, offering a non-invasive, equipment-free method for health monitoring.

Contribution

It introduces a novel approach using pre-trained speech models combined with Conv-LSTM to estimate respiration rate from speech, demonstrating improved accuracy over baseline methods.

Findings

01

Low root-mean-squared error in respiration time-series estimation

02

High correlation coefficient with groundtruth respiration data

03

Mean absolute error of approximately 1.6 breaths per minute

Abstract

The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual. Existing approaches to measure RR (number of breaths one takes in a minute) are performed using specialized equipment or training. Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input. Speech-based estimation of RR can offer an effective approach to measure the vital metric without requiring any specialized equipment or sensors. This work investigates a machine learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsMemory Network