A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection
Chao Hao, Zitong Yu, Xin Liu, Jun Xu, Huanjing Yue, Jingyu Yang

TL;DR
This paper introduces SENet, a simple vision Transformer-based network that effectively detects camouflaged and salient objects, outperforming complex models and demonstrating versatility across tasks with novel modules and training strategies.
Contribution
Proposes a simple asymmetric ViT-based encoder-decoder network with local information capture and dynamic loss for improved camouflaged and salient object detection.
Findings
Competitive results on multiple benchmarks.
Enhanced local information modeling improves segmentation.
Joint training strategy benefits SOD performance.
Abstract
Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Previous works achieved good performance by stacking various hand-designed modules and multi-scale features. However, these carefully-designed complex networks often performed well on one task but not on another. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Infrared Target Detection Methodologies · Advanced Image Fusion Techniques
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings
