GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via   VLM

Keshav Bimbraw; Ye Wang; Jing Liu; Toshiaki Koike-Akino

arXiv:2407.10870·cs.CV·July 16, 2024

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Keshav Bimbraw, Ye Wang, Jing Liu, Toshiaki Koike-Akino

PDF

Open Access

TL;DR

This paper demonstrates that large vision-language models like GPT-4o can decode hand gestures from forearm ultrasound images without fine-tuning, leveraging few-shot learning to enhance performance in a specialized medical task.

Contribution

The study shows that GPT-4o can perform gesture decoding from ultrasound images without fine-tuning, highlighting the potential of foundation models in medical applications.

Findings

01

GPT-4o successfully decodes hand gestures from ultrasound images.

02

Few-shot learning improves gesture decoding accuracy.

03

No fine-tuning required for effective performance.

Abstract

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections · Softmax