HiFiSeg: High-Frequency Information Enhanced Polyp Segmentation with   Global-Local Vision Transformer

Jingjing Ren; Xiaoyong Zhang; Lina Zhang

arXiv:2410.02528·cs.CV·October 11, 2024

HiFiSeg: High-Frequency Information Enhanced Polyp Segmentation with Global-Local Vision Transformer

Jingjing Ren, Xiaoyong Zhang, Lina Zhang

PDF

Open Access

TL;DR

HiFiSeg is a novel vision transformer-based network that enhances high-frequency feature processing to improve colon polyp segmentation accuracy, especially for small targets and boundary details.

Contribution

It introduces a global-local interaction module and a selective aggregation module within a pyramid vision transformer framework for better high-frequency information capture.

Findings

01

Achieved state-of-the-art mDice scores of 0.826 and 0.822 on CVC-ColonDB and ETIS datasets.

02

Effectively captures boundary details and small targets in polyp segmentation.

03

Outperforms existing methods in complex scenarios.

Abstract

Numerous studies have demonstrated the strong performance of Vision Transformer (ViT)-based methods across various computer vision tasks. However, ViT models often struggle to effectively capture high-frequency components in images, which are crucial for detecting small targets and preserving edge details, especially in complex scenarios. This limitation is particularly challenging in colon polyp segmentation, where polyps exhibit significant variability in structure, texture, and shape. High-frequency information, such as boundary details, is essential for achieving precise semantic segmentation in this context. To address these challenges, we propose HiFiSeg, a novel network for colon polyp segmentation that enhances high-frequency information processing through a global-local vision transformer framework. HiFiSeg leverages the pyramid vision transformer (PVT) as its encoder and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVehicle License Plate Recognition · Engineering Applied Research

MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings