MAviS: A Multimodal Conversational Assistant For Avian Species
Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shabzan Khan, Rao Anwer, Salman Khan, Hisham Cholakkal

TL;DR
This paper introduces MAviS, a multimodal conversational AI for avian species, featuring a new dataset, model, and benchmark to improve species-specific ecological understanding and question answering.
Contribution
The paper presents a novel multimodal dataset, MAviS-Dataset, and a specialized LLM, MAviS-Chat, for avian species, along with a benchmark, MAviS-Bench, to advance ecological AI applications.
Findings
MAviS-Chat outperforms baseline models significantly.
Achieved state-of-the-art results on avian species question answering.
Demonstrated effectiveness of domain-specific multimodal models.
Abstract
Fine-grained understanding and species-specific multimodal question answering are vital for advancing biodiversity conservation and ecological monitoring. However, existing multimodal large language models face challenges when it comes to specialized topics like avian species, making it harder to provide accurate and contextually relevant information in these areas. To address this limitation, we introduce the MAviS-Dataset, a large-scale multimodal avian species dataset that integrates image, audio, and text modalities for over 1,000 bird species, comprising both pretraining and instruction-tuning subsets enriched with structured question-answer pairs. Building on the MAviS-Dataset, we introduce MAviS-Chat, a multimodal LLM that supports audio, vision, and text and is designed for fine-grained species understanding, multimodal question answering, and scene-specific description…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Neural Network Applications
