CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

Wenhan Wang; Zhixiang Zhou; Zhongtian Ma; Yanzhu Chen; Ziyu Lin; Hao Sheng; Pengfei Liu; Honglin Ma; Wenqi Shao; Qiaosheng Zhang; Yu Qiao

arXiv:2603.28474·cs.CV·March 31, 2026

CiQi-Agent: Aligning Vision, Tools and Aesthetics in Multimodal Agent for Cultural Reasoning on Chinese Porcelains

Wenhan Wang, Zhixiang Zhou, Zhongtian Ma, Yanzhu Chen, Ziyu Lin, Hao Sheng, Pengfei Liu, Honglin Ma, Wenqi Shao, Qiaosheng Zhang, Yu Qiao

PDF

1 Repo 1 Datasets

TL;DR

CiQi-Agent is a multimodal AI system designed for detailed analysis and connoisseurship of Chinese porcelain, integrating vision, retrieval, and reasoning to outperform existing models on a large expert-annotated dataset.

Contribution

The paper introduces CiQi-Agent, a novel multimodal agent with a large dataset and benchmark for Chinese porcelain analysis, enabling fine-grained, explainable connoisseurship.

Findings

01

CiQi-Agent outperforms all competitive models on six porcelain attributes.

02

Achieves 12.2% higher accuracy than GPT-5 on CiQi-Bench.

03

The dataset and model are publicly available for research.

Abstract

The connoisseurship of antique Chinese porcelain demands extensive historical expertise, material understanding, and aesthetic sensitivity, making it difficult for non-specialists to engage. To democratize cultural-heritage understanding and assist expert connoisseurship, we introduce CiQi-Agent -- a domain-specific Porcelain Connoisseurship Agent for intelligent analysis of antique Chinese porcelain. CiQi-Agent supports multi-image porcelain inputs and enables vision tool invocation and multimodal retrieval-augmented generation, performing fine-grained connoisseurship analysis across six attributes: dynasty, reign period, kiln site, glaze color, decorative motif, and vessel shape. Beyond attribute classification, it captures subtle visual details, retrieves relevant domain knowledge, and integrates visual and textual evidence to produce coherent, explainable connoisseurship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/datasets/SII-Monument-Valley/CiQi-VQA
github

Datasets

SII-Monument-Valley/CiQi-VQA
dataset· 440 dl
440 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.