FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting
Yi Liu, Jingyu Song, Vedanth Kallakuri, Katherine A. Skinner

TL;DR
FishDetector-R1 is a novel weakly supervised framework that combines multi-modal large language models and reinforcement learning to improve underwater fish detection, segmentation, and counting with minimal annotations.
Contribution
It introduces a unified MLLM-based framework with a detect-to-count prompt and reinforcement learning from verifiable rewards, enhancing performance and robustness in underwater fish analysis.
Findings
Achieves 20% AP improvement on DeepFish dataset
Reduces MAE by 30% and GAME by 35%
Generalizes well across multiple underwater datasets
Abstract
Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-based framework for fish detection, segmentation, and counting under weak supervision. On the DeepFish dataset, our framework achieves substantial gains over baselines, improving AP by 20% and mIoU by 10%, while reducing MAE by 30% and GAME by 35%. These improvements stem from two key components: a novel detect-to-count prompt that enforces spatially consistent detections and counts, and Reinforcement Learning from Verifiable Reward (RLVR) with a complementary scalable paradigm leveraging sparse point labels. Ablation studies further validate the effectiveness of this reward design. Moreover, the improvement generalizes well to other underwater datasets, confirming strong cross-domain robustness.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWater Quality Monitoring Technologies · Advanced Neural Network Applications · Image Enhancement Techniques
