TL;DR
This paper introduces OR-VSKC, a synthetic data benchmark for studying visual-semantic knowledge conflicts in surgical safety risk detection, addressing data scarcity and privacy issues in operating rooms.
Contribution
It presents a new synthetic dataset and benchmark for analyzing and mitigating knowledge conflicts in multimodal models within surgical environments.
Findings
State-of-the-art models show significant reliability gaps in OR safety tasks.
Fine-tuning on OR-VSKC improves model robustness and generalization.
The synthetic benchmark enables effective research in safety-critical medical AI.
Abstract
Automated identification of surgical safety risks is critical for improving patient outcomes; however, Multimodal Large Language Models (MLLMs) frequently suffer from Visual-Semantic Knowledge Conflicts (VS-KC), a phenomenon where models possess safety knowledge but fail to activate it during visual inspection. Investigating this alignment gap in operating rooms (ORs) is impeded by a critical bottleneck: the scarcity and privacy constraints of real-world OR data depicting safety violations. To address this, we introduce OR-VSKC, a benchmark for studying VS-KC and surgical risk perception in strictly regulated OR environments. Constructed via our Protocol-to-Pixel Generative Framework, OR-VSKC comprises 28,190 high-fidelity synthetic images grounded in authoritative safety standards, complemented by a 713-image expert-authored challenge subset validated by multiple experts. The full…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
