EPIC: Efficient Predicate-Guided Inference-Time Control for Compositional Text-to-Image Generation
Sunung Mun, Sunghyun Cho, Jungseul Ok

TL;DR
EPIC is a training-free, inference-time refinement framework for compositional text-to-image generation that improves accuracy and efficiency by predicate-guided search and targeted editing.
Contribution
EPIC introduces a novel predicate-guided search method for inference-time refinement in compositional T2I generation, significantly enhancing accuracy and reducing computational costs.
Findings
Prompt-level accuracy increased from 34.16% to 71.46%.
EPIC outperforms prior refinement baselines by 19.23 points.
Reduces image-model executions by 31%, MLLM calls by 72%, and tokens by 81%.
Abstract
Recent text-to-image (T2I) generators can synthesize realistic images, but still struggle with compositional prompts involving multiple objects, counts, attributes, and relations. We introduce EPIC (Efficient Predicate-Guided Inference-Time Control), a training-free inference-time refinement framework for compositional T2I generation. EPIC casts refinement as predicate-guided search: it parses the original prompt once into a fixed visual program of object variables and typed predicates, covering checkable conditions such as object presence, counts, attributes, and relations. Each generated or edited image is verified against this program using visual evidence extracted from that image. An image is judged to satisfy the prompt only when all predicates are satisfied; otherwise, failed predicates decide the next step, routing local failures to targeted editing and global failures to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
