Colonoscopy Landmark Detection using Vision Transformers
Aniruddha Tamhane, Tse'ela Mida, Erez Posner, Moshe Bouhnik

TL;DR
This paper introduces a vision-transformer based algorithm for automatic detection of key anatomical landmarks in colonoscopy images, aiming to streamline post-procedure documentation and improve clinical workflow.
Contribution
It presents a novel landmark detection method using vision transformers trained on a new dataset of colonoscopy snapshots, outperforming traditional CNN backbones.
Findings
Achieved 82% accuracy with vision transformer on test data.
Compared transformer backbone with ResNet-101 and ConvNext-B, demonstrating competitive performance.
Developed an adaptive gamma correction preprocessing step for consistent image brightness.
Abstract
Colonoscopy is a routine outpatient procedure used to examine the colon and rectum for any abnormalities including polyps, diverticula and narrowing of colon structures. A significant amount of the clinician's time is spent in post-processing snapshots taken during the colonoscopy procedure, for maintaining medical records or further investigation. Automating this step can save time and improve the efficiency of the process. In our work, we have collected a dataset of 120 colonoscopy videos and 2416 snapshots taken during the procedure, that have been annotated by experts. Further, we have developed a novel, vision-transformer based landmark detection algorithm that identifies key anatomical landmarks (the appendiceal orifice, ileocecal valve/cecum landmark and rectum retroflexion) from snapshots taken during colonoscopy. Our algorithm uses an adaptive gamma correction during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsColorectal Cancer Screening and Detection
MethodsAttention Is All You Need · Test · Linear Layer · Softmax · Residual Connection · Dense Connections · Multi-Head Attention · Layer Normalization · Vision Transformer
