TL;DR
Sum-of-Checks enhances surgical safety assessment by decomposing complex visual reasoning into expert-defined checks, improving accuracy and transparency of large vision-language models in critical laparoscopic procedures.
Contribution
It introduces a structured framework that decomposes surgical reasoning into verifiable checks, improving reliability and auditability of AI in surgical safety assessment.
Findings
Sum-of-Checks improves average frame-level mean average precision by 12-14%.
LVLMs are reliable on observational checks but variable on anatomical evidence.
Explicitly separating evidence from decision-making enhances AI transparency in surgery.
Abstract
Purpose: Accurate assessment of the Critical View of Safety (CVS) during laparoscopic cholecystectomy is essential to prevent bile duct injury, a complication associated with significant morbidity and mortality. While large vision-language models (LVLMs) offer flexible reasoning, their predictions remain difficult to audit and unreliable on safety-critical surgical tasks. Methods: We introduce Sum-of-Checks, a framework that decomposes each CVS criterion into expert-defined reasoning checks reflecting clinically relevant visual evidence. Given a laparoscopic frame, an LVLM evaluates each check, producing a binary judgment and justification. Criterion-level scores are computed via fixed, weighted aggregation of check outcomes. We evaluate on the Endoscapes2023 benchmark using three frontier LVLMs, comparing against direct prompting, chain-of-thought, and sub-question decomposition,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
