Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Jing Zhang; Wentao Jiang; Tao Huang; Zhiwei Wang; Jianxin Liu; Jian Chen; Ping Ye; Gang Wang; Zengmao Wang; Bo Du; Dacheng Tao

arXiv:2604.28011·cs.CV·May 1, 2026

Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation

Jing Zhang, Wentao Jiang, Tao Huang, Zhiwei Wang, Jianxin Liu, Jian Chen, Ping Ye, Gang Wang, Zengmao Wang, Bo Du, Dacheng Tao

PDF

1 Repo

TL;DR

Echo-α is a multimodal reasoning model that unifies lesion detection and clinical reasoning for ultrasound interpretation, achieving superior accuracy and interpretability across multiple benchmarks.

Contribution

It introduces an agentic framework that combines specialized detectors with global reasoning, trained via supervised curriculum and reinforcement learning.

Findings

01

Outperforms baselines on renal and breast ultrasound benchmarks.

02

Achieves 56.73%/43.78% [email protected] for grounding on cross-center tests.

03

Reaches 74.90%/49.20% accuracy in diagnosis for renal/breast ultrasound.

Abstract

Ultrasound interpretation requires both precise lesion localization and holistic clinical reasoning, yet existing methods typically excel at only one of these capabilities: specialized detectors offer strong localization but limited reasoning, whereas multimodal large language models (MLLMs) provide flexible reasoning but weak grounding in specialized medical domains. We present Echo-{\alpha}, an agentic multimodal reasoning model for ultrasound interpretation that unifies these strengths within an invoke-and-reason framework. Echo-{\alpha} is trained to coordinate organ-specific detector outputs, integrate them with global visual context, and convert the resulting evidence into grounded diagnostic decisions beyond detector-only inference. This behavior is established through a nine-task supervised curriculum and then refined by sequential reinforcement learning under different reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiliLab/Echo-Alpha
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.