Open-Source Manually Annotated Vocal Tract Database for Automatic   Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D   Convolutional and Transformer Networks

Subin Erattakulangara; Karthika Kelat; Katie Burnham; Rachel Balbi,; Sarah E. Gerard; David Meyer; Sajan Goud Lingala

arXiv:2501.06229·cs.CV·March 3, 2025

Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks

Subin Erattakulangara, Karthika Kelat, Katie Burnham, Rachel Balbi,, Sarah E. Gerard, David Meyer, Sajan Goud Lingala

PDF

TL;DR

This paper introduces an open-source, manually annotated 3D MRI vocal tract database and benchmarks deep learning models, including 2D and 3D CNNs and Transformers, for automatic segmentation, aiming to improve efficiency and accuracy.

Contribution

It provides a new annotated dataset and compares the performance of various deep learning architectures for vocal tract segmentation from 3D MRI.

Findings

01

Deep learning models achieve high segmentation accuracy

02

Transformers outperform CNNs in certain scenarios

03

Open-source dataset facilitates future research

Abstract

Accurate segmentation of the vocal tract from magnetic resonance imaging (MRI) data is essential for various voice and speech applications. Manual segmentation is time intensive and susceptible to errors. This study aimed to evaluate the efficacy of deep learning algorithms for automatic vocal tract segmentation from 3D MRI.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.