Recurrent Multi-scale Transformer for High-Resolution Salient Object Detection
Xinhao Deng, Pingping Zhang, Wei Liu, Huchuan Lu

TL;DR
This paper introduces a new high-resolution saliency detection dataset and a novel Recurrent Multi-scale Transformer model that improves the accuracy of high-resolution salient object detection.
Contribution
It presents the largest high-resolution SOD dataset (HRS10K) and a novel RMFormer model that enhances high-resolution saliency detection through recurrent multi-scale refinement.
Findings
RMFormer outperforms existing methods on benchmarks.
HRS10K dataset facilitates training and evaluation.
Proposed method achieves more complete object regions.
Abstract
Salient Object Detection (SOD) aims to identify and segment the most conspicuous objects in an image or video. As an important pre-processing step, it has many potential applications in multimedia and vision tasks. With the advance of imaging devices, SOD with high-resolution images is of great demand, recently. However, traditional SOD methods are largely limited to low-resolution images, making them difficult to adapt to the development of High-Resolution SOD (HRSOD). Although some HRSOD methods emerge, there are no large enough datasets for training and evaluating. Besides, current HRSOD methods generally produce incomplete object regions and irregular object boundaries. To address above issues, in this work, we first propose a new HRS10K dataset, which contains 10,500 high-quality annotated images at 2K-8K resolution. As far as we know, it is the largest dataset for the HRSOD task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout · Absolute Position Encodings · Byte Pair Encoding
