MARVIS: Modality Adaptive Reasoning over VISualizations

Benjamin Feuer; Lennart Purucker; Oussama Elachqar; Chinmay Hegde

arXiv:2507.01544·cs.LG·April 30, 2026

MARVIS: Modality Adaptive Reasoning over VISualizations

Benjamin Feuer, Lennart Purucker, Oussama Elachqar, Chinmay Hegde

PDF

1 Repo

TL;DR

MARVIS is a versatile system that transforms various modality data into visualizations, enabling large vision-language models to perform well across diverse domains without domain-specific training.

Contribution

It introduces a modality adaptive reasoning approach that leverages visualizations and VLMs, achieving competitive results across multiple domains with a single model.

Findings

01

Outperforms Gemini 2.0 by 16% on average across domains.

02

Achieves competitive performance in vision, audio, biological, and tabular data.

03

Reduces the gap between generalist models and specialized domain methods.

Abstract

Predictive applications of machine learning often rely on small (sub 1 Bn parameter) specialized models tuned to particular domains or modalities. Such models often achieve excellent performance, but lack flexibility. LLMs and VLMs offer versatility, but typically underperform specialized predictors, especially on non-traditional modalities and long-tail domains. We propose MARVIS (Modality Adaptive Reasoning over VISualizations), a system that transforms latent embedding spaces into visual representations and then leverages the spatial and fine-grained reasoning skills of VLMs to interpret the visualizations and utilize them for predictions successfully. MARVIS achieves competitive performance across vision, audio, biological, and tabular domains using a single 3B parameter model, yielding results that beat Gemini 2.0 by 16% on average. MARVIS drastically reduces the gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

penfever/marvis
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.