OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment

Weiyi Zhao; Xiaoyu Tan; Liang Liu; Sijia Li; Youwei Song; Xihe Qiu

arXiv:2506.22500·cs.CV·May 1, 2026

OR-VSKC: Resolving Visual-Semantic Knowledge Conflicts in Operating Rooms with Synthetic Data-Guided Alignment

Weiyi Zhao, Xiaoyu Tan, Liang Liu, Sijia Li, Youwei Song, Xihe Qiu

PDF

1 Repo

TL;DR

This paper introduces OR-VSKC, a synthetic data benchmark for studying visual-semantic knowledge conflicts in surgical safety risk detection, addressing data scarcity and privacy issues in operating rooms.

Contribution

It presents a new synthetic dataset and benchmark for analyzing and mitigating knowledge conflicts in multimodal models within surgical environments.

Findings

01

State-of-the-art models show significant reliability gaps in OR safety tasks.

02

Fine-tuning on OR-VSKC improves model robustness and generalization.

03

The synthetic benchmark enables effective research in safety-critical medical AI.

Abstract

Automated identification of surgical safety risks is critical for improving patient outcomes; however, Multimodal Large Language Models (MLLMs) frequently suffer from Visual-Semantic Knowledge Conflicts (VS-KC), a phenomenon where models possess safety knowledge but fail to activate it during visual inspection. Investigating this alignment gap in operating rooms (ORs) is impeded by a critical bottleneck: the scarcity and privacy constraints of real-world OR data depicting safety violations. To address this, we introduce OR-VSKC, a benchmark for studying VS-KC and surgical risk perception in strictly regulated OR environments. Constructed via our Protocol-to-Pixel Generative Framework, OR-VSKC comprises 28,190 high-fidelity synthetic images grounded in authoritative safety standards, complemented by a 713-image expert-authored challenge subset validated by multiple experts. The full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgg2577/VS-KC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.