SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate   Detection in Gigapixel Images

Wenxi Li; Ruxin Zhang; Haozhe Lin; Yuchen Guo; Chao Ma; Xiaokang Yang

arXiv:2407.17956·cs.CV·July 29, 2024

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

Wenxi Li, Ruxin Zhang, Haozhe Lin, Yuchen Guo, Chao Ma, Xiaokang Yang

PDF

TL;DR

SaccadeDet is a dual-stage architecture inspired by human eye movements that efficiently detects objects in gigapixel images by focusing on regions of interest, significantly reducing computational costs and increasing speed.

Contribution

It introduces a novel two-stage detection framework tailored for gigapixel images, combining region proposal and refinement inspired by saccadic eye movements.

Findings

01

Achieves 8x faster detection than existing methods

02

Effectively processes gigapixel images with reduced computational load

03

Demonstrates potential in medical pathology analysis

Abstract

The advancement of deep learning in object detection has predominantly focused on megapixel images, leaving a critical gap in the efficient processing of gigapixel images. These super high-resolution images present unique challenges due to their immense size and computational demands. To address this, we introduce 'SaccadeDet', an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement. The cornerstone of SaccadeDet is its ability to strategically select and process image regions, dramatically reducing computational load. This is achieved through a two-stage process: the 'saccade' stage, which identifies regions of probable interest, and the 'gaze' stage, which refines detection in these targeted areas. Our approach, evaluated on the PANDA dataset, not only achieves an 8x speed increase over the state-of-the-art methods but also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings