# Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation

**Authors:** Krit Duangprom, Tryphon Lambrou, and Binod Bhattarai

arXiv: 2508.20830 · 2025-08-29

## TL;DR

This paper introduces a new method for 2D surgical tool keypoint detection using vision-language models fine-tuned with low-rank adaptation, achieving high performance with minimal training data.

## Contribution

It proposes leveraging pre-trained vision-language models with LoRA for efficient, low-resource 2D keypoint estimation in surgical tools, outperforming traditional CNN and Transformer methods.

## Key findings

- Outperforms baseline models after only two epochs of fine-tuning
- Demonstrates effectiveness of LoRA in low-resource medical imaging scenarios
- Enables future extension to 3D pose estimation of surgical tools

## Abstract

This paper presents a novel pipeline for 2D keypoint estima- tion of surgical tools by leveraging Vision Language Models (VLMs) fine- tuned using a low rank adjusting (LoRA) technique. Unlike traditional Convolutional Neural Network (CNN) or Transformer-based approaches, which often suffer from overfitting in small-scale medical datasets, our method harnesses the generalization capabilities of pre-trained VLMs. We carefully design prompts to create an instruction-tuning dataset and use them to align visual features with semantic keypoint descriptions. Experimental results show that with only two epochs of fine tuning, the adapted VLM outperforms the baseline models, demonstrating the ef- fectiveness of LoRA in low-resource scenarios. This approach not only improves keypoint detection performance, but also paves the way for future work in 3D surgical hands and tools pose estimation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20830/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20830/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/2508.20830/full.md

---
Source: https://tomesphere.com/paper/2508.20830