Recurrent Multi-scale Transformer for High-Resolution Salient Object   Detection

Xinhao Deng; Pingping Zhang; Wei Liu; Huchuan Lu

arXiv:2308.03826·cs.CV·September 6, 2023·2 cites

Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection

Xinhao Deng, Pingping Zhang, Wei Liu, Huchuan Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new high-resolution saliency detection dataset and a novel Recurrent Multi-scale Transformer model that improves the accuracy of high-resolution salient object detection.

Contribution

It presents the largest high-resolution SOD dataset (HRS10K) and a novel RMFormer model that enhances high-resolution saliency detection through recurrent multi-scale refinement.

Findings

01

RMFormer outperforms existing methods on benchmarks.

02

HRS10K dataset facilitates training and evaluation.

03

Proposed method achieves more complete object regions.

Abstract

Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video. As an important pre-processing step, it has many potential applications in multimedia and vision tasks. With the advance of imaging devices, SOD with high-resolution images is of great demand, recently. However, traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no large enough datasets for training and evaluating. Besides, current HRSOD methods generally produce incomplete object regions and irregular object boundaries. To address above issues, in this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution. As far as we know, it is the largest dataset for the HRSOD task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

drowsymon/rmformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding