TL;DR
This paper introduces a noise-aware contrastive learning method leveraging temporal structure in colonoscopy videos to learn robust polyp representations without extensive manual labeling.
Contribution
It proposes a novel self-supervised approach with a noise-aware loss that effectively utilizes temporal associations in videos, reducing reliance on costly annotations.
Findings
Outperforms prior self-supervised and supervised methods across multiple tasks.
Achieves comparable or better results than recent foundation models.
Uses only 27 videos to train a lightweight encoder.
Abstract
Learning robust representations of polyp tracklets is key to enabling multiple AI-assisted colonoscopy applications, from polyp characterization to automated reporting and retrieval. Supervised contrastive learning is an effective approach for learning such representations, but it typically relies on correct positive and negative definitions. Collecting these labels requires linking tracklets that depict the same underlying polyp entity throughout the video, which is costly and demands specialized clinical expertise. In this work, we leverage the sequential workflow of colonoscopy procedures to derive self-supervised associations from temporal structure. Since temporally derived associations are not guaranteed to be correct, we introduce a noise-aware contrastive loss to account for noisy associations. We demonstrate the effectiveness of the learned representations across multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
