AiSciVision: A Framework for Specializing Large Multimodal Models in Scientific Image Classification
Brendan Hogan, Anmol Kabra, Felipe Siqueira Pacheco, Laura, Greenstreet, Joshua Fan, Aaron Ferber, Marta Ummus, Alecsander Brito, Olivia, Graham, Lillian Aoki, Drew Harvell, Alex Flecker, Carla Gomes

TL;DR
AiSciVision is a framework that specializes large multimodal models for scientific image classification by mimicking expert reasoning, improving interpretability and performance in niche scientific domains.
Contribution
The paper introduces AiSciVision, combining visual retrieval-augmented generation and domain-specific tools to enhance interpretability and accuracy of large multimodal models in scientific image classification.
Findings
Outperforms fully supervised models in low and full-labeled data settings
Provides interpretable reasoning transcripts for each prediction
Successfully deployed in real-world aquaculture research applications
Abstract
Trust and interpretability are crucial for the use of Artificial Intelligence (AI) in scientific research, but current models often operate as black boxes offering limited transparency and justifications for their outputs. We introduce AiSciVision, a framework that specializes Large Multimodal Models (LMMs) into interactive research partners and classification models for image classification tasks in niche scientific domains. Our framework uses two key components: (1) Visual Retrieval-Augmented Generation (VisRAG) and (2) domain-specific tools utilized in an agentic workflow. To classify a target image, AiSciVision first retrieves the most similar positive and negative labeled images as context for the LMM. Then the LMM agent actively selects and applies tools to manipulate and inspect the target image over multiple rounds, refining its analysis before making a final prediction. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
