A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Shuai Wang; Hongyi Zhu; Jia-Hong Huang; Yixian Shen; Chengxi Zeng; Stevan Rudinac; Monika Kackovic; Nachoem Wijnberg; Marcel Worring

arXiv:2604.19689·cs.AI·April 22, 2026

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

Shuai Wang, Hongyi Zhu, Jia-Hong Huang, Yixian Shen, Chengxi Zeng, Stevan Rudinac, Monika Kackovic, Nachoem Wijnberg, Marcel Worring

PDF

1 Repo

TL;DR

A-MAR introduces an agent-based framework for multimodal art retrieval that explicitly plans reasoning steps, improving interpretability and evidence grounding in artwork understanding.

Contribution

It presents a novel structured reasoning plan approach for multimodal art retrieval, enhancing explainability and multi-step reasoning capabilities.

Findings

01

A-MAR outperforms static retrieval and baseline models in explanation quality.

02

It demonstrates improved evidence grounding and reasoning on ArtCoT-QA.

03

Code and data are publicly available at the provided GitHub link.

Abstract

Understanding artworks requires multi-step reasoning over visual content and cultural, historical, and stylistic context. While recent multimodal large language models show promise in artwork explanation, they rely on implicit reasoning and internalized knowl- edge, limiting interpretability and explicit evidence grounding. We propose A-MAR, an Agent-based Multimodal Art Retrieval framework that explicitly conditions retrieval on structured reasoning plans. Given an artwork and a user query, A-MAR first decomposes the task into a structured reasoning plan that specifies the goals and evidence requirements for each step. Retrieval is then conditionedon this plan, enabling targeted evidence selection and supporting step-wise, grounded explanations. To evaluate agent-based multi- modal reasoning within the art domain, we introduce ArtCoT-QA. This diagnostic benchmark features multi-step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ShuaiWang97/A-MAR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.