# Cross-Cultural Identification of Acoustic Voice Features for Depression: A Cross-Sectional Study of Vietnamese and Japanese Datasets

**Authors:** Phuc Truong Vinh Le, Mitsuteru Nakamura, Masakazu Higuchi, Lanh Thi My Vuu, Nhu Huynh, Shinichi Tokuno

PMC · DOI: 10.3390/bioengineering13010033 · Bioengineering · 2025-12-27

## TL;DR

This study identifies acoustic voice features that may help detect depression across different cultures, using data from Vietnamese and Japanese participants.

## Contribution

The study introduces a set of cross-culturally consistent acoustic features for depression screening using Vietnamese and Japanese datasets.

## Key findings

- A cross-cultural model achieved an AUC of 0.934 for depression detection using 12 selected acoustic features.
- Performance varied between Japanese (AUC = 0.993) and Vietnamese (AUC = 0.913) cohorts due to dataset differences.
- The study highlights the need for standardized multilingual trials to improve generalizability.

## Abstract

Acoustic voice analysis demonstrates potential as a non-invasive biomarker for depression, yet its generalizability across languages remains underexplored. This cross-sectional study aimed to identify a set of cross-culturally consistent acoustic features for depression screening using distinct Vietnamese and Japanese voice datasets. We analyzed anonymized recordings from 251 participants, comprising 123 Vietnamese individuals assessed via the self-report Beck Depression Inventory (BDI) and 128 Japanese individuals assessed via the clinician-rated Hamilton Depression Rating Scale (HAM-D). From 6373 features extracted with openSMILE, a multi-stage selection pipeline identified 12 cross-cultural features, primarily from the auditory spectrum (AudSpec), Mel-Frequency Cepstral Coefficients (MFCCs), and logarithmic Harmonics-to-Noise Ratio (logHNR) domains. The cross-cultural model achieved a combined Area Under the Curve (AUC) of 0.934, with performance disparities observed between the Japanese (AUC = 0.993) and Vietnamese (AUC = 0.913) cohorts. This disparity may be attributed to dataset heterogeneity, including mismatched diagnostic tools and differing sample compositions (clinical vs. mixed community). Furthermore, the limited number of high-risk cases (n = 33) warrants cautious interpretation regarding the reliability of reported AUC values for severe depression classification. These findings suggest the presence of a core acoustic signature related to physiological psychomotor changes that may transcend linguistic boundaries. This study advances the exploration of global vocal biomarkers but underscores the need for prospective, standardized multilingual trials to overcome the limitations of secondary data analysis.

## Linked entities

- **Diseases:** depression (MONDO:0002050)

## Full-text entities

- **Diseases:** Depression (MESH:D003866)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12837578/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12837578/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12837578/full.md

---
Source: https://tomesphere.com/paper/PMC12837578