# Simultaneous Speech and Eating Behavior Recognition Using Data Augmentation and Two-Stage Fine-Tuning

**Authors:** Toshihiro Tsukagoshi, Masafumi Nishida, Masafumi Nishimura

PMC · DOI: 10.3390/s25051544 · 2025-03-02

## TL;DR

This paper introduces a new method to recognize both speech and eating behaviors at the same time, using data augmentation and two-stage fine-tuning to improve accuracy in health monitoring.

## Contribution

The novel approach combines synthetic data augmentation with two-stage fine-tuning for simultaneous speech and eating behavior recognition.

## Key findings

- The method achieves an F1 score of 0.918 for chewing detection and 0.926 for swallowing detection.
- Speech recognition accuracy is maintained while improving eating behavior detection performance.

## Abstract

Speaking and eating are essential components of health management. To enable the daily monitoring of these behaviors, systems capable of simultaneously recognizing speech and eating behaviors are required. However, due to the distinct acoustic and contextual characteristics of these two domains, achieving high-precision integrated recognition remains underexplored. In this study, we propose a method that combines data augmentation through synthetic data creation with a two-stage fine-tuning approach tailored to the complexity of domain adaptation. By concatenating speech and eating sounds of varying lengths and sequences, we generated training data that mimic real-world environments where speech and eating behaviors co-exist. Additionally, efficient model adaptation was achieved through two-stage fine-tuning of the self-supervised learning model. The experimental evaluations demonstrate that the proposed method maintains speech recognition accuracy while achieving high detection performance for eating behaviors, with an F1 score of 0.918 for chewing detection and 0.926 for swallowing detection. These results underscore the potential of using voice recognition technology for daily health monitoring.

## Full-text entities

- **Diseases:** decline in swallowing function (MESH:D003680), aspiration pneumonia (MESH:D011015), diabetes (MESH:D003920), injury to (MESH:D014947), obesity (MESH:D009765), CTC (MESH:D008310), eating (MESH:D001068), depression (MESH:D003866), cognitive decline (MESH:D003072), Alzheimer's disease (MESH:D000544)
- **Chemicals:** blood glucose (MESH:D001786), water (MESH:D014867), CTC (-)
- **Species:** Brassica oleracea (wild cabbage, species) [taxon 3712], Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11902618/full.md

---
Source: https://tomesphere.com/paper/PMC11902618