Increasing the Efficiency of DETR for Maritime High-Resolution Images

Tinsae Yehuala; Hao Cheng; Ville Lehtola

arXiv:2605.10269·cs.CV·May 12, 2026

Increasing the Efficiency of DETR for Maritime High-Resolution Images

Tinsae Yehuala, Hao Cheng, Ville Lehtola

PDF

TL;DR

This paper enhances DETR's efficiency for maritime high-resolution images by integrating Vision Mamba backbones, token pruning, and a tailored feature pyramid network, enabling accurate real-time object detection on resource-limited platforms.

Contribution

It introduces a novel combination of Vision Mamba backbones and optimized network design to improve detection accuracy and efficiency for high-resolution maritime imagery.

Findings

01

Outperforms RT-DETR with ResNet50 in accuracy and efficiency

02

Achieves real-time detection on high-resolution maritime images

03

Reduces computational load via token pruning and specialized network design

Abstract

Maritime object detection is critical for the safe navigation of unmanned surface vessels (USVs), requiring accurate recognition of obstacles from small buoys to large vessels. Real-time detection is challenging due to long distances, small object sizes, large-scale variations, edge computing limitations, and the high memory demands of high-resolution imagery. Existing solutions, such as downsampling or image splitting, often reduce accuracy or require additional processing, while memory-efficient models typically handle only limited resolutions. To overcome these limitations, we leverage Vision Mamba (ViM) backbones, which build on State Space Models (SSMs) to capture long-range dependencies while scaling linearly with sequence length. Images are tokenized into sequences for efficient high-resolution processing. For further computational efficiency, we design a tailored Feature Pyramid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.