Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact   Convolutional Transformers and SAM2

Olivier Morelle (1; 2); Justus Bisten (1); Maximilian W. M.; Wintergerst (2; 5); Robert P. Finger (2; 4); Thomas Schultz (1; 3); ((1) B-IT; Department of Computer Science; University of Bonn; (2); Department of Ophthalmology; University Hospital Bonn; (3) Lamarr Institute; for Machine Learning; Artificial Intelligence; (4) Department of; Ophthalmology; University Medical Center Mannheim; Heidelberg University; (5); Augenzentrum Grischun; Chur; Switzerland)

arXiv:2501.05933·cs.CV·March 24, 2025

Weakly Supervised Segmentation of Hyper-Reflective Foci with Compact Convolutional Transformers and SAM2

Olivier Morelle (1, 2), Justus Bisten (1), Maximilian W. M., Wintergerst (2, 5), Robert P. Finger (2, 4), Thomas Schultz (1, 3), ((1) B-IT, Department of Computer Science, University of Bonn, (2), Department of Ophthalmology, University Hospital Bonn, (3) Lamarr Institute

PDF

Open Access

TL;DR

This paper introduces a novel weakly supervised segmentation framework for small structures in OCT images, combining attention-based MIL, Layer-wise Relevance Propagation, SAM2, and Compact Convolutional Transformers to improve resolution and accuracy.

Contribution

It proposes a new approach that enhances weakly supervised segmentation of small structures by integrating LRP, SAM2, and CCT, overcoming limitations of coarse localization and downsampling.

Findings

01

Improved segmentation accuracy for hyper-reflective foci in OCT images.

02

Enhanced spatial resolution and recall through iterative inference.

03

Effective use of CCT and SAM2 in weakly supervised segmentation.

Abstract

Weakly supervised segmentation has the potential to greatly reduce the annotation effort for training segmentation models for small structures such as hyper-reflective foci (HRF) in optical coherence tomography (OCT). However, most weakly supervised methods either involve a strong downsampling of input images, or only achieve localization at a coarse resolution, both of which are unsatisfactory for small structures. We propose a novel framework that increases the spatial resolution of a traditional attention-based Multiple Instance Learning (MIL) approach by using Layer-wise Relevance Propagation (LRP) to prompt the Segment Anything Model (SAM~2), and increases recall with iterative inference. Moreover, we demonstrate that replacing MIL with a Compact Convolutional Transformer (CCT), which adds a positional encoding, and permits an exchange of information between different regions of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOptical Systems and Laser Technology · Advanced SAR Imaging Techniques

MethodsAbsolute Position Encodings · Adam · Residual Connection · Dropout · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Multi-Head Attention · Position-Wise Feed-Forward Layer