VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites

Md. Adnanul Islam; Md. Faiyaz Abdullah Sayeedi; Md. Asaduzzaman Shuvo; Shahanur Rahman Bappy; Md Asiful Islam; Swakkhar Shatabda

arXiv:2506.14629·cs.CV·April 14, 2026

VisText-Mosquito: A Unified Multimodal Dataset for Visual Detection, Segmentation, and Textual Explanation on Mosquito Breeding Sites

Md. Adnanul Islam, Md. Faiyaz Abdullah Sayeedi, Md. Asaduzzaman Shuvo, Shahanur Rahman Bappy, Md Asiful Islam, Swakkhar Shatabda

PDF

1 Repo

TL;DR

VisText-Mosquito is a comprehensive multimodal dataset and model framework for detecting, segmenting, and explaining mosquito breeding sites to aid in disease prevention.

Contribution

The paper introduces a new multimodal dataset and fine-tuned models for automated mosquito breeding site analysis using visual and textual data.

Findings

01

YOLOv9s achieved 0.92926 precision in detection

02

YOLOv11n-Seg reached 0.91587 segmentation precision

03

Mosquito-LLaMA3-8B achieved BLEU score of 54.7 in explanation generation

Abstract

Mosquito-borne diseases pose a major global health risk, requiring early detection and proactive control of breeding sites to prevent outbreaks. In this paper, we present VisText-Mosquito, a multimodal dataset that integrates visual and textual data to support automated detection, segmentation, and explanation for mosquito breeding site analysis. The dataset includes 1,828 annotated images for object detection, 142 images for water surface segmentation, and natural language explanation texts linked to each image. The YOLOv9s model achieves the highest precision of 0.92926 and mAP@50 of 0.92891 for object detection, while YOLOv11n-Seg reaches a segmentation precision of 0.91587 and mAP@50 of 0.79795. For textual explanation generation, we tested a range of large vision-language models (LVLMs) in both zero-shot and few-shot settings. Our fine-tuned Mosquito-LLaMA3-8B model achieved the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adnanul-islam-jisun/VisText-Mosquito
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.