Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning
Yu Xu, Yuxin Zhang, Juan Cao, Lin Gao, Chunyu Wang, Oliver Deussen, Tong-Yee Lee, Fan Tang

TL;DR
This paper introduces a novel multi-agent framework inspired by cognitive theories to enable AI models to transfer abstract visual metaphors between images, significantly improving creative and logical consistency in generated visuals.
Contribution
It proposes a schema-driven, multi-agent system based on Conceptual Blending Theory for autonomous visual metaphor transfer, advancing beyond pixel-level generative models.
Findings
Outperforms state-of-the-art in metaphor consistency and analogy appropriateness
Enhances visual creativity in AI-generated images
Demonstrates effectiveness through extensive experiments and human evaluations
Abstract
A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric. Despite the remarkable progress of generative AI, existing models remain largely confined to pixel-level instruction alignment and surface-level appearance preservation, failing to capture the underlying abstract logic necessary for genuine metaphorical generation. To bridge this gap, we introduce the task of Visual Metaphor Transfer (VMT), which challenges models to autonomously decouple the "creative essence" from a reference image and re-materialize that abstract logic onto a user-specified target subject. We propose a cognitive-inspired, multi-agent framework that operationalizes Conceptual Blending Theory (CBT) through a novel Schema Grammar ("G"). This structured representation decouples relational invariants…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
