Translating AI research into reality: summary of the 2025 voice AI Symposium and Hackathon
Samantha Salvi Cruz, Jamie Toghranegar, Bradley Malin, Tarun Mehra, Bob MacDonald, Marisha Speights, Camille Noufi, Yan Fossat, Guy Fagherrazi, Nicholas Cummins, Abir Elbeji, Alden Blatter, Alexander Gelbard, Arianna Arienzo, Sebastien Baur, Katie Wetstone, Julián Peller

TL;DR
The 2025 Voice AI Symposium focused on turning voice-based AI research into real healthcare applications, emphasizing ethical practices and practical implementation.
Contribution
The paper highlights the shift from theoretical research to clinical implementation in voice AI, emphasizing ethical and translational challenges.
Findings
Voice is identified as a multimodal biomarker reflecting various health states.
The symposium emphasized the importance of ethical data practices and human-centered design in AI healthcare tools.
Implementation panels stressed workflow alignment and usability for real-world adoption.
Abstract
The 2025 Voice AI Symposium represented a transition from conceptual research to clinical implementation in vocal biomarker science. Hosted by the NIH-funded Bridge2AI-Voice consortium, the meeting convened global experts to address the methodological, ethical, and translational challenges of integrating voice-based artificial intelligence (AI) into healthcare. This mini-review synthesizes symposium insights across six domains: multimodal integration, FAIR (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) data governance, clinical translation, interdisciplinary training, and cross-sector innovation. Research presented demonstrated voice as a latent, multimodal biomarker reflecting neurological, cardiopulmonary, and psychological states, while discussions emphasized ethical data practices and human-centered design.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1- —NIH Office of the Director10.13039/100000052
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Voice and Speech Disorders · Social Media in Health Education
Introduction
The Voice AI Symposium, hosted by the Bridge2AI-Voice consortium, is an annual event to foster collaboration, engagement, and innovation around the use of vocal biomarkers in healthcare. While previous years emphasized foundational research, the 2025 Symposium reflected a pivotal shift in the field towards implementation. International researchers, thought leaders, startup owners, clinicians, and regulators gathered in Tampa, Florida to collaborate and discuss the practical validation, deployment and integration of vocal biomarkers into healthcare delivery systems. Throughout the symposium, scientific advancements were discussed alongside ethical challenges and infrastructural requirements for clinical translation. The following conference proceedings serve as an overview and thematic discussion of the 2025 event.
Background
Voice is increasingly recognized as a non-invasive, data-rich biomarker of health, capable of capturing a wide range of physiological and psychological states (1). As a multidimensional signal tied to respiratory, cognitive, neurological, and emotional function, voice provides clinical insight into both primary voice disorders as well as voice-affecting conditions (1–3). Artificial intelligence (AI) and machine learning models are increasingly used to detect subtle acoustic features in voice recordings that are often undetectable to the human ear, but may signal early signs of disease (4, 5). AI models trained on vocal features have potential to operate within tools to assist in disease identification, classification, and dynamic monitoring of health states. The rise of wearable technologies and telemedicine has made it increasingly feasible to collect voice data passively and at scale (1, 3, 6). Despite these advancements, however, complex challenges remain, including the need for ethically-sourced, standardized, and AI-ready datasets, as well as best practices for data privacy, consent, and mitigating algorithmic bias (4).
The National Institutes of Health (NIH)-funded Bridge2AI-Voice program is seeking to address these challenges by building a large, ethically sourced, multimodal voice database linked to clinical data. Using multi-disorder protocols built by interdisciplinary teams of experts, Bridge2AI-Voice data is collected using standardized recording protocols and is then clinically validated by physicians and healthcare providers. The robust Bridge2AI-Voice dataset supports the development of predictive models across five clinical domains: voice disorders, respiratory conditions, mood disorders, neurological diseases, and pediatric speech and language disorders. This coordinated, multi-site effort is also producing the associated tools and training resources for reproducible, ethical, and scalable clinically meaningful vocal biomarker research.
To further scientific discovery and collaboration among researchers in this field, Bridge2AI-Voice hosts an annual symposium that brings together experts from academia, healthcare, industry, bioethics, and patient advocacy groups. The following represent conference proceedings of the 2025 Voice AI Symposium. This paper is organized around the thematic insights and scientific discussions presented at the 2025 Bridge2AI Voice Symposium, emphasizing methodological innovations, ethical considerations, and strategies for integrating vocal biomarkers into clinical practice.
Format
The 2025 Voice AI Symposium presented a comprehensive program reflecting both scientific depth and translational relevance. The meeting featured two keynote speeches: Dr. Bradley Malin [“How Synthetic Data Can (and Cannot) Help Privacy in Voice Datasets”] and Microsoft's Tarun Mehra (“Voice AI and the Future of Healthcare”). In addition to these plenary sessions, the symposium included interactive workshops, a startup pitch competition, poster presentations, a tech fair, and structured networking opportunities. Four podium and panel sessions occurred with the following themes: “Emerging Research Methods and Technologies”; “Ethical Translation and Clinical Implementation of Voice AI”; “Implementation of Acoustic Biomarkers of Airway and Cardiorespiratory Disorders”; and “Advancements in Voice Biomarkers of Neurological Diseases”. Abstracts were selected through a competitive call-for-science process, and all submissions underwent peer review by a scientific committee comprising internal and external experts in the field, thereby ensuring the quality, rigor, and diversity of the program.
Methods
All symposium presentations were recorded in full. Audio files were uploaded to Whisper v3 (OpenAI, 2025), which generated transcripts for each session. These transcripts were then processed using ChatGPT (v4) to identify recurring scientific insights, conceptual patterns, and cross-disciplinary themes. The resulting thematic clusters informed the structure of this paper.
Results
The following results are organized around 6 emergent themes identified through transcript analysis of all recorded events. These themes included the multimodality of voice data, Ambient AI Scribes and Ethical Implications of Continuous Voice Capture, FAIR (Findable, Accessible, Interoperable, Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, Ethics) for voice AI, translational readiness of vocal biomarkers, the need for interdisciplinary training, and innovation through cross-sector engagement. The following is a summation and discussion of the most pressing and high-impact scientific themes that emerged at this year's event.
An exploration of voice and multimodality
Research presented at the 2025 event reflected an emerging theme: the concept of voice as a latent, multimodal biomarker capable of providing a holistic lens on health beyond discrete symptom tracking. While previous studies have explored vocal features in neurological, cardiopulmonary, and psychiatric contexts, this year's presentations added empirical depth and interdisciplinary range, presenting voice as a complex, integrative signal with diagnostic, prognostic, and monitoring potential (1, 7). Rather than pointing to isolated conditions, vocal patterns increasingly reflect the interaction of physiological, neurological, and psychological systems. This evolving perspective is shifting the field from narrow, disease-specific modeling to broader, systems-level approaches. Vocal markers, such as articulation rate, prosody, timing, and lexical variation, are being investigated as early signals of general physiological decline (4). Researchers are now turning to probabilistic, multi-output models that can link these features to composite health states, which mirrors the complexity of comorbid, real-world populations. Dr. Lampros Kourtis illustrated this approach with findings from the Framingham Heart Study (8), which paired over 4,000 voice recordings with MRI-derived brain volume data. Vocal markers like jitter, articulation rate, and lexical diversity were significantly associated with structural changes in memory-related brain regions (8). In cardiopulmonary research, Dr. Jas Sara's team developed models, leveraging both HuBERT and traditional signal-processing approaches, that detect subtle changes in breath support, phonatory control, and speech timing. These tools flag early signs of decompensation in heart failure patients and outperform standard clinical indices, offering clinicians a more proactive and cost-effective way to manage chronic care. Together, these studies reflect a growing consensus: voice is a sensitive, often pre-symptomatic, marker of physiological change.
Ambient AI scribes and ethical implications of continuous voice capture
Recent advances in voice-based AI have highlighted ambient AI scribe systems as a promising interface between vocal biomarker research and clinical deployment. In his keynote, Tarun Mehra emphasized that ambient scribe technologies, originally designed to reduce documentation burden, are rapidly evolving into platforms capable of passively capturing clinically meaningful vocal signals during routine care encounters. This shift reframes voice not only as an active diagnostic input but as a continuously acquired signal embedded within everyday clinical workflows. However, symposium discussions highlighted that this paradigm introduces distinct ethical and governance challenges that extend beyond traditional voice AI use cases. Unlike task-based recordings, ambient scribe systems collect speech continuously, raising questions around consent granularity, secondary data use, speaker identifiability, and power asymmetries between patients, clinicians, and health systems. Ethical panels and Dr. Bradley Malin's keynote emphasized that safeguards such as synthetic data, de-identification, and auditability are necessary but insufficient. They further underscored the need for participatory governance models and transparency around how ambient voice data are repurposed for model development. Importantly, speakers cautioned that ambient AI scribes risks amplifying existing inequities if deployed without attention to linguistic diversity, sociocultural context, and algorithmic bias, particularly given the variability of speech across dialects, health states, and care settings. Collectively, these discussions highlighted that ambient AI scribes requires coordinated attention to technical performance, ethical governance, and real-world evaluation to support clinical use.
FAIR and CARE in voice AI
Researchers and scientists are facing increasing pressure to understand what are best practices for ethical voice data governance. At this year's symposium, speakers addressed these concerns and emphasized that advances in voice AI must be guided by governance frameworks grounded in both FAIR and CARE principles (9, 10). Dr. Bradley Malin's keynote on synthetic data highlighted both its promise for protecting voice privacy and its limitations, underscoring the need for nuanced consent and governance protocols given the uniquely personal nature of voice. During a panel discussion, it was discussed that 38 publicly available voice repositories have significant gaps in metadata, demographic representation, and transparency, illustrating risks to both FAIR and CARE objectives. Speakers stressed that governance must remain iterative and participatory, with human oversight and interpretability central to ethical implementation. As Dr. Rhoda Au noted, equity in voice AI extends beyond datasets to the diversity of teams developing these technologies. Embedding FAIR and CARE principles throughout design, governance, and practice is essential for building trust and ensuring responsible scale-up in healthcare applications (1).
Translational readiness: real-world clinical integration
As the vocal biomarker field advances and evolves, there is increasing interest in translating findings into real-world clinical integration. Dr. Hugo Botha offered a physician's perspective on this challenge. His team has piloted voice-based decision support tools in post-operative care units, using speech analysis to monitor recovery and flag early signs of complications. These tools have the potential to streamline documentation and enhance care team coordination. Still, Botha emphasized that successful deployment depends on seamless integration. Voice tools must align with existing clinical workflows, electronic health record systems, and the everyday communication patterns of providers; otherwise, they risk becoming burdensome rather than beneficial. Expert panelists discussed barriers to real-world implementation, including time-constrained workflows, infrastructure disparities, and valid concerns about patient privacy (see Figure 1). Many advocated for low-burden, device-agnostic tools that operate within existing clinical ecosystems, especially in under-resourced settings.
Vocal biomarker implementation barriers & solutions in clinical care.
Other important considerations for effective implementation include the need to mitigate algorithmic bias. An educational workshop provided a deep-dive discussion on incorporating linguistic and speaker variability into datasets to ensure robust translation of voice AI technologies into real-world clinical and social contexts. Dr. Satrajit Ghosh's team offered technical strategies to support this aim, including cross-lingual embeddings and culturally attuned annotation protocols. They emphasized that achieving translational success requires technical accuracy alongside design choices that foster usability, trust, and alignment with care delivery practices.
The need for interdisciplinary training for voice AI researchers
Building AI models with voice data for clinical applications introduces challenges unfamiliar to many computer scientists, requiring integration of biomedical, clinical, and ethical expertise. The 2025 Voice AI Symposium offered interactive educational workshops on Bridge2AI-Voice dataset and resources, as well as a full day Hackathon event with a non-programmer educational track to provide in-depth training on how to ethically and effectively build models with voice data for healthcare applications. Workshops included a live videostroboscopy-demo, illustrating the physiological basis of vocal signals to attendees, as well as a demonstration of how to use the Bridge2AI-Voice data and tools, and in-depth discussion of data standards. Collectively, these initiatives highlighted the imperative to cultivate hybrid professionals with both technical and clinical competencies.
Translational innovation and cross-sector engagement
A key component of the Voice AI Symposium was also engagement with industry and startup sectors who are translating and developing voice AI solutions into healthcare tools. Use cases presented by participating teams in the pitch competition spanned depression risk screening, congestive heart failure monitoring, assistive communication technologies, and occupational health applications, reflecting the breadth of proposed clinical and non-clinical deployments for voice-based models. Importantly, the evaluation criteria emphasized usability, workflow integration, and readiness for prospective clinical validation, underscoring that translational success depends on alignment with real-world care environments rather than model performance alone.
Discussion
The 2025 Bridge2AI Voice Symposium reflected a field advancing from early proof-of-concept studies toward the practical challenges of clinical translation. Presentations demonstrated how acoustic features can reflect interactions among neurological, cardiopulmonary, and psychological systems, pointing to a broader systems-level framing of health (2, 8). This shift marks an important maturation: rather than treating voice as a disease-specific signal, researchers are increasingly positioning it as an integrative marker of overall physiological state. The discussions suggested that this conceptual transition is shaping the future of the field more than any single technical advance. Methodologically, researchers are continuing to refine multimodal integration strategies and develop analytic approaches that can handle both population diversity and individual variability over time (11,12,13).
Workforce development is also a parallel imperative. It is imperative that we build a pipeline of researchers who are trained to work across computer science, biomedicine, and ethics will be essential for sustaining progress. Finally, international governance frameworks specific to voice data are still being developed to ensure that datasets and models are built responsibly and equitably (9, 10). Together, these directions point toward a field that is increasingly translational, but also reflective about the conditions under which voice AI can become a reliable and inclusive part of healthcare practice.
Conclusion
Voice AI is entering its translational phase, moving from possibility to clinical practice. This mini-review captures a pivotal moment in that journey: where breakthroughs in science, ethics, and tooling converged toward actionable pathways. However, continued progress will require collective accountability, with community values shaping consent protocols, governance frameworks, and the practical decisions around when, where, and how voice technologies are deployed.
Voice carries diagnostic power, but also memory, identity, and emotional resonance. As we continue to build, test, and deploy these technologies, we must remain attuned to both. Through sustained collaboration and ethical vigilance, voice AI can become a foundation for care that is intelligent, inclusive, and genuinely responsive.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Fagherazzi G Fischer A Ismael M Despotovic V. Voice for health: the use of vocal biomarkers from research to clinical practice. Digit Biomark. (2021) 5(1):78–88. 10.1159/00051534634056518 PMC 8138221 · doi ↗ · pubmed ↗
- 2Cummins N Scherer S Krajewski J Schnieder S Epps J Quatieri TF. A review of depression and suicide risk assessment using speech analysis. Speech Commun. (2015) 71:10–49. 10.1016/j.specom.2015.03.004 · doi ↗
- 3Low DM Bentley KH Ghosh SS. Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Investig Otolaryngol. (2020) 5(1):96–116. 10.1002/lio 2.35432128436 PMC 7042657 · doi ↗ · pubmed ↗
- 4Tsanas A Little MA Mc Sharry PE Ramig LO. Accurate telemonitoring of Parkinson’s disease progression by noninvasive speech tests. IEEE Trans Biomed Eng. (2010) 57(4):884–93. 10.1109/TBME.2009.203600019932995 · doi ↗ · pubmed ↗
- 5Wodziński M Skalski A Hemmerling D Orozco-Arroyave JR Nöth E. Deep learning approach to Parkinson’s disease detection using voice recordings and convolutional neural network dedicated to image classification. Annu Int Conf IEEE Eng Med Biol Soc. (2019) 2019:717–20. 10.1109/EMBC.2019.885697231945997 · doi ↗ · pubmed ↗
- 6Rogers HP Hseu A Kim J Silberholz E Jo S Dorste A Voice as a biomarker of pediatric health: a scoping review. Children. (2024) 11(6):684. 10.3390/children 1106068438929263 PMC 11201680 · doi ↗ · pubmed ↗
- 7Hemmerling D Wojcik-Pedziwiatr M. Prediction and estimation of Parkinson’s disease severity based on voice signal. J Voice. (2022) 36(3):439–e 9. 10.1016/j.jvoice.2020.06.00432807590 · doi ↗ · pubmed ↗
- 8Ding H Hamel AP Karjadi C Ang TFA Lu S Thomas RJ Association between acoustic features and brain volumes: the Framingham heart study. Front Dement. (2023) 2:1214940. 10.3389/frdem.2023.121494038911669 PMC 11192548 · doi ↗ · pubmed ↗
