Progressive Multimodal Interaction Network for Reliable Quantification of Fish Feeding Intensity in Aquaculture
Shulong Zhang, Mingyuan Yao, Jiayin Zhao, Daoliang Li, Yingyi Chen, Haihua Wang

TL;DR
This paper introduces a Progressive Multimodal Interaction Network (PMIN) that fuses image, audio, and water-wave data to reliably quantify fish feeding intensity, enhancing accuracy and robustness in aquaculture monitoring.
Contribution
The paper presents a novel multimodal fusion framework with unified feature extraction, cross-modal interaction mechanisms, and adaptive decision fusion, improving reliability over existing methods.
Findings
PMIN achieves 96.76% accuracy on a fish feeding dataset.
The method outperforms both homogeneous and heterogeneous comparison models.
Ablation studies confirm the effectiveness of each component.
Abstract
Accurate quantification of fish feeding intensity is crucial for precision feeding in aquaculture, as it directly affects feed utilization and farming efficiency. Although multimodal fusion has proven to be an effective solution, existing methods often overlook the inconsistencies in responses and decision conflicts between different modalities, thus limiting the reliability of the quantification results. To address this issue, this paper proposes a Progressive Multimodal Interaction Network (PMIN) that integrates image, audio, and water-wave data for fish feeding intensity quantification. Specifically, a unified feature extraction framework is first constructed to map inputs from different modalities into a structurally consistent feature space, thereby reducing representational discrepancies across modalities. Then, an auxiliary-modality reinforcement primary-modality mechanism is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
