AffordTissue: Dense Affordance Prediction for Tool-Action Specific Tissue Interaction
Aiza Maksutova, Lalithkumar Seenivasan, Hao Ding, Jiru Xu, Chenhao Yu, Chenyan Jing, Yiqing Shen, Mathias Unberath

TL;DR
AffordTissue introduces a multimodal framework for dense tissue affordance prediction in surgery, improving safety and precision in tool-tissue interactions during procedures like cholecystectomy.
Contribution
The paper presents the first tissue affordance benchmark and a novel multimodal model that outperforms existing vision-language baselines in dense surgical affordance prediction.
Findings
Substantial improvement over vision-language models (20.6 px vs. 60.2 px ASSD).
Curated and annotated 15,638 video clips across 103 surgeries.
Demonstrated potential for explicit spatial reasoning and safe automation in surgery.
Abstract
Surgical action automation has progressed rapidly toward achieving surgeon-like dexterous control, driven primarily by advances in learning from demonstration and vision-language-action models. While these have demonstrated success in table-top experiments, translating them to clinical deployment remains challenging: current methods offer limited predictability on where instruments will interact on tissue surfaces and lack explicit conditioning inputs to enforce tool-action-specific safe interaction regions. Addressing this gap, we introduce AffordTissue, a multimodal framework for predicting tool-action specific tissue affordance regions as dense heatmaps during cholecystectomy. Our approach combines a temporal vision encoder capturing tool motion and tissue dynamics across multiple viewpoints, language conditioning enabling generalization across diverse instrument-action pairs, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
