Beyond Rigid AI: Towards Natural Human-Machine Symbiosis for Interoperative Surgical Assistance
Lalithkumar Seenivasan, Jiru Xu, Roger D. Soberanis Mukul, Hao Ding, Grayson Byrd, Yu-Chun Ku, Jose L. Porras, Masaru Ishii, and Mathias Unberath

TL;DR
This paper presents a novel perception agent that uses advanced language and vision models to enable flexible, natural, real-time human-machine interaction in surgical assistance, overcoming the rigidity of traditional AI systems.
Contribution
Introduces a perception agent integrating large language models and foundation models for flexible, real-time intraoperative surgical scene understanding with memory capabilities.
Findings
Performance comparable to manual prompting strategies
Effective segmentation of known and unseen surgical elements
Demonstrated flexibility in segmenting novel surgical scene components
Abstract
Emerging surgical data science and robotics solutions, especially those designed to provide assistance in situ, require natural human-machine interfaces to fully unlock their potential in providing adaptive and intuitive aid. Contemporary AI-driven solutions remain inherently rigid, offering limited flexibility and restricting natural human-machine interaction in dynamic surgical environments. These solutions rely heavily on extensive task-specific pre-training, fixed object categories, and explicit manual-prompting. This work introduces a novel Perception Agent that leverages speech-integrated prompt-engineered large language models (LLMs), segment anything model (SAM), and any-point tracking foundation models to enable a more natural human-machine interaction in real-time intraoperative surgical assistance. Incorporating a memory repository and two novel mechanisms for segmenting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAugmented Reality Applications · Artificial Intelligence in Healthcare and Education
