Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency
Weijie Ma, Ye Zhu, Ruimao Zhang, Jie Yang, Yiwen Hu, Zhen Li, Li Xiang

TL;DR
This paper introduces a novel Transformer-based method that uses structured cross-modal representation consistency to improve colorectal polyp classification accuracy in white-light images, eliminating the need for NBI images in clinical settings.
Contribution
The proposed approach aligns multi-modal features via a Spatial Attention Module, enabling accurate classification using only white-light images without relying on NBI images.
Findings
Outperforms recent methods in classification accuracy.
Achieves multi-modal prediction with a single Transformer.
Significantly improves WL image classification in clinical scenarios.
Abstract
The colorectal polyps classification is a critical clinical examination. To improve the classification accuracy, most computer-aided diagnosis algorithms recognize colorectal polyps by adopting Narrow-Band Imaging (NBI). However, the NBI usually suffers from missing utilization in real clinic scenarios since the acquisition of this specific image requires manual switching of the light mode when polyps have been detected by using White-Light (WL) images. To avoid the above situation, we propose a novel method to directly achieve accurate white-light colonoscopy image classification by conducting structured cross-modal representation consistency. In practice, a pair of multi-modal images, i.e. NBI and WL, are fed into a shared Transformer to extract hierarchical feature representations. Then a novel designed Spatial Attention Module (SAM) is adopted to calculate the similarities between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Colorectal Cancer Screening and Detection
MethodsAttention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · Label Smoothing · Residual Connection · Byte Pair Encoding · Adam · Layer Normalization · Position-Wise Feed-Forward Layer
