Spatial-ORMLLM: Improve Spatial Relation Understanding in the Operating Room with Multimodal Large Language Model
Peiqi He, Zhenhao Zhang, Yixiang Zhang, Xiongjun Zhao, Shaoliang Peng

TL;DR
Spatial-ORMLLM is a novel large vision-language model designed for 3D spatial reasoning in operating rooms, using only RGB data to enhance surgical scene understanding and support clinical decision-making.
Contribution
It introduces a new 3D spatial reasoning framework that leverages 2D RGB data without extra sensors, integrating spatial knowledge into a unified large multimodal model.
Findings
Achieves state-of-the-art performance on clinical datasets
Generalizes well to unseen surgical scenarios
Enhances spatial understanding in complex surgical scenes
Abstract
Precise spatial modeling in the operating room (OR) is foundational to many clinical tasks, supporting intraoperative awareness, hazard avoidance, and surgical decision-making. While existing approaches leverage large-scale multimodal datasets for latent-space alignment to implicitly learn spatial relationships, they overlook the 3D capabilities of MLLMs. However, this approach raises two issues: (1) Operating rooms typically lack multiple video and audio sensors, making multimodal 3D data difficult to obtain; (2) Training solely on readily available 2D data fails to capture fine-grained details in complex scenes. To address this gap, we introduce Spatial-ORMLLM, the first large vision-language model for 3D spatial reasoning in operating rooms using only RGB modality to infer volumetric and semantic cues, enabling downstream medical tasks with detailed and holistic spatial context.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Surgical Simulation and Training · Augmented Reality Applications
