HiFiSeg: High-Frequency Information Enhanced Polyp Segmentation with Global-Local Vision Transformer
Jingjing Ren, Xiaoyong Zhang, Lina Zhang

TL;DR
HiFiSeg is a novel vision transformer-based network that enhances high-frequency feature processing to improve colon polyp segmentation accuracy, especially for small targets and boundary details.
Contribution
It introduces a global-local interaction module and a selective aggregation module within a pyramid vision transformer framework for better high-frequency information capture.
Findings
Achieved state-of-the-art mDice scores of 0.826 and 0.822 on CVC-ColonDB and ETIS datasets.
Effectively captures boundary details and small targets in polyp segmentation.
Outperforms existing methods in complex scenarios.
Abstract
Numerous studies have demonstrated the strong performance of Vision Transformer (ViT)-based methods across various computer vision tasks. However, ViT models often struggle to effectively capture high-frequency components in images, which are crucial for detecting small targets and preserving edge details, especially in complex scenarios. This limitation is particularly challenging in colon polyp segmentation, where polyps exhibit significant variability in structure, texture, and shape. High-frequency information, such as boundary details, is essential for achieving precise semantic segmentation in this context. To address these challenges, we propose HiFiSeg, a novel network for colon polyp segmentation that enhances high-frequency information processing through a global-local vision transformer framework. HiFiSeg leverages the pyramid vision transformer (PVT) as its encoder and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVehicle License Plate Recognition · Engineering Applied Research
MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
