A Simple yet Effective Network based on Vision Transformer for   Camouflaged Object and Salient Object Detection

Chao Hao; Zitong Yu; Xin Liu; Jun Xu; Huanjing Yue; Jingyu Yang

arXiv:2402.18922·cs.CV·March 1, 2024·3 cites

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Chao Hao, Zitong Yu, Xin Liu, Jun Xu, Huanjing Yue, Jingyu Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SENet, a simple vision Transformer-based network that effectively detects camouflaged and salient objects, outperforming complex models and demonstrating versatility across tasks with novel modules and training strategies.

Contribution

Proposes a simple asymmetric ViT-based encoder-decoder network with local information capture and dynamic loss for improved camouflaged and salient object detection.

Findings

01

Competitive results on multiple benchmarks.

02

Enhanced local information modeling improves segmentation.

03

Joint training strategy benefits SOD performance.

Abstract

Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Previous works achieved good performance by stacking various hand-designed modules and multi-scale features. However, these carefully-designed complex networks often performed well on one task but not on another. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linuxsino/senet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Infrared Target Detection Methodologies · Advanced Image Fusion Techniques

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Multi-Head Attention · Softmax · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings