CAL-RAG: Retrieval-Augmented Multi-Agent Generation for Content-Aware Layout Design
Najmeh Forouzandehmehr, Reza Yousefi Maragheh, Sriram Kollipara, Kai Zhao, Topojoy Biswas, Evren Korpeoglu, Kannan Achan

TL;DR
CAL-RAG introduces a retrieval-augmented, multi-agent framework that leverages multimodal retrieval, large language models, and iterative reasoning to generate semantically aligned, visually coherent content-aware layouts, achieving state-of-the-art results.
Contribution
This work presents CAL-RAG, a novel framework combining retrieval, LLMs, and collaborative agents for improved automated layout design, addressing limitations of prior models.
Findings
Achieves state-of-the-art performance on PKU PosterLayout dataset
Outperforms baseline models like LayoutPrompter in multiple metrics
Demonstrates scalable and interpretable layout generation process
Abstract
Automated content-aware layout generation -- the task of arranging visual elements such as text, logos, and underlays on a background canvas -- remains a fundamental yet under-explored problem in intelligent design systems. While recent advances in deep generative models and large language models (LLMs) have shown promise in structured content generation, most existing approaches lack grounding in contextual design exemplars and fall short in handling semantic alignment and visual coherence. In this work we introduce CAL-RAG, a retrieval-augmented, agentic framework for content-aware layout generation that integrates multimodal retrieval, large language models, and collaborative agentic reasoning. Our system retrieves relevant layout examples from a structured knowledge base and invokes an LLM-based layout recommender to propose structured element placements. A vision-language grader…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Data Visualization and Analytics
