BowNet: Dilated Convolution Neural Network for Ultrasound Tongue Contour Extraction
M. Hamed Mozaffari, Won-Sook Lee

TL;DR
This paper introduces BowNet and wBowNet, two deep neural network models designed for real-time, accurate, and robust ultrasound tongue contour extraction, aiding speech analysis and impairment diagnosis.
Contribution
The paper presents two novel neural network architectures that leverage multi-scale context and full-resolution dilated convolutions for automatic tongue contour segmentation in ultrasound images.
Findings
BowNet outperforms existing methods in accuracy and robustness.
Combining localization and globalization improves segmentation results.
Models achieve real-time performance suitable for speech tracking applications.
Abstract
Ultrasound imaging is safe, relatively affordable, and capable of real-time performance. One application of this technology is to visualize and to characterize human tongue shape and motion during a real-time speech to study healthy or impaired speech production. Due to the noisy nature of ultrasound images with low-contrast characteristic, it might require expertise for non-expert users to recognize organ shape such as tongue surface (dorsum). To alleviate this difficulty for quantitative analysis of tongue shape and motion, tongue surface can be extracted, tracked, and visualized instead of the whole tongue region. Delineating the tongue surface from each frame is a cumbersome, subjective, and error-prone task. Furthermore, the rapidity and complexity of tongue gestures have made it a challenging task, and manual segmentation is not a feasible solution for real-time applications.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
