From Decision to Action in Surgical Autonomy: Multi-Modal Large Language   Models for Robot-Assisted Blood Suction

Sadra Zargarzadeh; Maryam Mirzaei; Yafei Ou; Mahdi Tavakoli

arXiv:2408.07806·cs.RO·January 30, 2025

From Decision to Action in Surgical Autonomy: Multi-Modal Large Language Models for Robot-Assisted Blood Suction

Sadra Zargarzadeh, Maryam Mirzaei, Yafei Ou, Mahdi Tavakoli

PDF

TL;DR

This paper introduces a multi-modal large language model system for autonomous blood suction in robotic surgery, combining high-level reasoning with low-level motion control to handle complex, dynamic surgical scenarios.

Contribution

It presents a novel distributed architecture integrating LLMs and deep reinforcement learning for autonomous surgical decision-making and action execution.

Findings

01

Multi-modal LLM improves surgical reasoning in complex scenarios

02

System effectively handles blood clots and active bleeding

03

Enhanced decision-making accuracy in autonomous blood suction

Abstract

The rise of Large Language Models (LLMs) has impacted research in robotics and automation. While progress has been made in integrating LLMs into general robotics tasks, a noticeable void persists in their adoption in more specific domains such as surgery, where critical factors such as reasoning, explainability, and safety are paramount. Achieving autonomy in robotic surgery, which entails the ability to reason and adapt to changes in the environment, remains a significant challenge. In this work, we propose a multi-modal LLM integration in robot-assisted surgery for autonomous blood suction. The reasoning and prioritization are delegated to the higher-level task-planning LLM, and the motion planning and execution are handled by the lower-level deep reinforcement learning model, creating a distributed agency between the two components. As surgical operations are highly dynamic and may…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.