VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language   Models for Trait Discovery from Biological Images

M. Maruf; Arka Daw; Kazi Sajeed Mehrab; Harish Babu Manogaran,; Abhilash Neog; Medha Sawhney; Mridul Khurana; James P. Balhoff; Yasin Bakis,; Bahadir Altintas; Matthew J. Thompson; Elizabeth G. Campolongo; Josef C.; Uyeda; Hilmar Lapp; Henry L. Bart; Paula M. Mabee; Yu Su; Wei-Lun Chao,; Charles Stewart; Tanya Berger-Wolf; Wasila Dahdul; Anuj Karpatne

arXiv:2408.16176·cs.CV·August 30, 2024·2 cites

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran,, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakis,, Bahadir Altintas, Matthew J. Thompson, Elizabeth G. Campolongo, Josef C., Uyeda, Hilmar Lapp, Henry L. Bart, Paula M. Mabee, Yu Su

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper evaluates 12 state-of-the-art vision-language models on a new dataset, VLM4Bio, to assess their ability to answer biologically relevant questions from images of organisms without fine-tuning.

Contribution

It introduces VLM4Bio, a large dataset for benchmarking VLMs in organismal biology, and analyzes the models' performance and reasoning capabilities on biologically relevant tasks.

Findings

01

VLMs show varying performance on biological questions.

02

Prompting techniques can improve model responses.

03

Current models sometimes hallucinate reasoning.

Abstract

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sammarfy/vlm4bio
pytorchOfficial

Datasets

imageomics/VLM4Bio
dataset· 330 dl
330 dl

Videos

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications