Pre-Trained Foundation Model representations to uncover Breathing patterns in Speech
Vikramjit Mitra, Anirban Chatterjee, Ke Zhai, Helen Weng, Ayuko Hill,, Nicole Hay, Christopher Webb, Jamie Cheng, Erdrin Azemi

TL;DR
This study leverages pre-trained speech representations from foundation models like Wav2Vec2 to accurately estimate breathing rate from speech signals, offering a non-invasive, equipment-free method for health monitoring.
Contribution
It introduces a novel approach using pre-trained speech models combined with Conv-LSTM to estimate respiration rate from speech, demonstrating improved accuracy over baseline methods.
Findings
Low root-mean-squared error in respiration time-series estimation
High correlation coefficient with groundtruth respiration data
Mean absolute error of approximately 1.6 breaths per minute
Abstract
The process of human speech production involves coordinated respiratory action to elicit acoustic speech signals. Typically, speech is produced when air is forced from the lungs and is modulated by the vocal tract, where such actions are interspersed by moments of breathing in air (inhalation) to refill the lungs again. Respiratory rate (RR) is a vital metric that is used to assess the overall health, fitness, and general well-being of an individual. Existing approaches to measure RR (number of breaths one takes in a minute) are performed using specialized equipment or training. Studies have demonstrated that machine learning algorithms can be used to estimate RR using bio-sensor signals as input. Speech-based estimation of RR can offer an effective approach to measure the vital metric without requiring any specialized equipment or sensors. This work investigates a machine learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsMemory Network
