Autonomous Skeletal Landmark Localization towards Agentic C-Arm Control

Jay Jung; Ahmad Arrabi; Jax Luo; Scott Raymond; Safwan Wshah

arXiv:2604.18740·cs.CV·April 22, 2026

Autonomous Skeletal Landmark Localization towards Agentic C-Arm Control

Jay Jung, Ahmad Arrabi, Jax Luo, Scott Raymond, Safwan Wshah

PDF

1 Repo

TL;DR

This paper explores adapting multimodal large language models for autonomous skeletal landmark localization to improve C-arm control, demonstrating competitive accuracy and reasoning capabilities.

Contribution

It introduces a novel approach using fine-tuned MLLMs for landmark localization, enabling autonomous C-arm positioning with reasoning and correction abilities.

Findings

01

MLLMs achieve competitive localization accuracy compared to deep learning methods.

02

Qualitative results show MLLMs can reason and correct initial predictions.

03

MLLMs can sequentially navigate C-arm towards target landmarks.

Abstract

Purpose: Automated C-arm positioning ensures timely treatment in patients requiring emergent interventions. When a conventional Deep Learning (DL) approach for C-arm control fails, clinicians must revert to manual operation, resulting in additional delays. Consequently, an agentic C-arm control framework based on multimodal large language models (MLLMs) is highly desirable, as it can incorporate clinician feedback and use reasoning to make adjustments toward more accurate positioning. Skeletal landmark localization is essential for C-arm control, and we investigate adapting MLLMs for autonomous landmark localization. Methods: We used an annotated synthetic X-ray dataset and a real X-ray dataset. Each X-ray in both datasets is paired with several skeletal landmarks. We fine-tuned two MLLMs and tasked them with retrieving the closest landmarks from each X-ray. Quantitative evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marszzibros/C-arm-localization-LLMs.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.