SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement

Zeyu Lei; Hongyuan Yu; Jinlin Wu; Zhen Chen

arXiv:2507.02252·cs.CV·July 4, 2025

SurgVisAgent: Multimodal Agentic Model for Versatile Surgical Visual Enhancement

Zeyu Lei, Hongyuan Yu, Jinlin Wu, Zhen Chen

PDF

TL;DR

SurgVisAgent is a versatile multimodal model that dynamically enhances surgical images across various distortion types, improving decision-making in complex real-world scenarios.

Contribution

It introduces a unified, end-to-end surgical vision agent leveraging multimodal large language models for multi-task image enhancement.

Findings

01

Outperforms traditional single-task models in diverse enhancement tasks

02

Effectively identifies distortion categories and severity levels in surgical images

03

Demonstrates potential as a comprehensive surgical assistance tool

Abstract

Precise surgical interventions are vital to patient safety, and advanced enhancement algorithms have been developed to assist surgeons in decision-making. Despite significant progress, these algorithms are typically designed for single tasks in specific scenarios, limiting their effectiveness in complex real-world situations. To address this limitation, we propose SurgVisAgent, an end-to-end intelligent surgical vision agent built on multimodal large language models (MLLMs). SurgVisAgent dynamically identifies distortion categories and severity levels in endoscopic images, enabling it to perform a variety of enhancement tasks such as low-light enhancement, overexposure correction, motion blur elimination, and smoke removal. Specifically, to achieve superior surgical scenario understanding, we design a prior model that provides domain-specific knowledge. Additionally, through in-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.