LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound

Xuechen Guo; Wenhao Chai; Shi-Yan Li; Gaoang Wang

arXiv:2410.15074·cs.CV·October 22, 2024

LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound

Xuechen Guo, Wenhao Chai, Shi-Yan Li, Gaoang Wang

PDF

Open Access

TL;DR

LLaVA-Ultra is a specialized multimodal model that combines Chinese language understanding with ultrasound image analysis, enabling accurate medical visual question answering through fine-grained, data-efficient training.

Contribution

The paper introduces a novel architecture with a fusion module and weighted scoring for medical images, along with a large-scale Chinese ultrasound dataset for effective fine-tuning.

Findings

01

Outperforms previous models on Med-VQA datasets

02

Demonstrates robustness in medical ultrasound scenarios

03

Achieves state-of-the-art accuracy in medical visual question answering

Abstract

Multimodal Large Language Model (MLLM) has recently garnered attention as a prominent research focus. By harnessing powerful LLM, it facilitates a transition of conversational generative AI from unimodal text to performing multimodal tasks. This boom begins to significantly impact medical field. However, general visual language model (VLM) lacks sophisticated comprehension for medical visual question answering (Med-VQA). Even models specifically tailored for medical domain tend to produce vague answers with weak visual relevance. In this paper, we propose a fine-grained adaptive VLM architecture for Chinese medical visual conversations through parameter-efficient tuning. Specifically, we devise a fusion module with fine-grained vision encoders to achieve enhancement for subtle medical visual semantics. Then we note data redundancy common to medical scenes is ignored in most prior works.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Artificial Intelligence Applications

MethodsSoftmax · Attention Is All You Need · Knowledge Distillation