Intelligent Fish Detection System with Similarity-Aware Transformer
Shengchen Li, Haobo Zuo, Changhong Fu, Zhiyong Wang, Zhiqiang Xu

TL;DR
This paper introduces FishViT, a lightweight, similarity-aware Transformer model for fast, accurate fish detection in dense groups, enhancing water-land transfer efficiency and reducing labor costs.
Contribution
The work proposes a novel similarity-aware vision Transformer with multi-level encoding and soft-threshold attention for improved fish detection accuracy and speed.
Findings
Achieves over 80 FPS in challenging scenarios.
Proves robustness and effectiveness on a new high-resolution benchmark.
Validates practicality in real water-land transfer scenarios.
Abstract
Fish detection in water-land transfer has significantly contributed to the fishery. However, manual fish detection in crowd-collaboration performs inefficiently and expensively, involving insufficient accuracy. To further enhance the water-land transfer efficiency, improve detection accuracy, and reduce labor costs, this work designs a new type of lightweight and plug-and-play edge intelligent vision system to automatically conduct fast fish detection with high-speed camera. Moreover, a novel similarity-aware vision Transformer for fast fish detection (FishViT) is proposed to onboard identify every single fish in a dense and similar group. Specifically, a novel similarity-aware multi-level encoder is developed to enhance multi-scale features in parallel, thereby yielding discriminative representations for varying-size fish. Additionally, a new soft-threshold attention mechanism is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Quality Monitoring Technologies
