Voice-assisted Image Labelling for Endoscopic Ultrasound Classification using Neural Networks
Ester Bonmati, Yipeng Hu, Alexander Grimwood, Gavin J. Johnson, George, Goodchild, Margaret G. Keane, Kurinchi Gurusamy, Brian Davidson, Matthew J., Clarkson, Stephen P. Pereira, Dean C. Barratt

TL;DR
This paper introduces a multi-modal CNN that combines voice comments and ultrasound images to automatically label endoscopic ultrasound images, reducing manual labeling effort and improving classification accuracy.
Contribution
It presents a novel neural network architecture that integrates verbal comments with image data for ultrasound image labeling, enhancing accuracy and efficiency.
Findings
Achieved 76% accuracy in image classification
Voice comments improve labeling performance
Reduces manual dataset annotation effort
Abstract
Ultrasound imaging is a commonly used technology for visualising patient anatomy in real-time during diagnostic and therapeutic procedures. High operator dependency and low reproducibility make ultrasound imaging and interpretation challenging with a steep learning curve. Automatic image classification using deep learning has the potential to overcome some of these challenges by supporting ultrasound training in novices, as well as aiding ultrasound image interpretation in patient with complex pathology for more experienced practitioners. However, the use of deep learning methods requires a large amount of data in order to provide accurate results. Labelling large ultrasound datasets is a challenging task because labels are retrospectively assigned to 2D images without the 3D spatial context available in vivo or that would be inferred while visually tracking structures between frames…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
