LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems

Bj\"orn Deiseroth; Patrick Schramowski; Hikaru Shindo; Devendra Singh; Dhami; Kristian Kersting

arXiv:2208.13518·cs.AI·August 30, 2022

LogicRank: Logic Induced Reranking for Generative Text-to-Image Systems

Bj\"orn Deiseroth, Patrick Schramowski, Hikaru Shindo, Devendra Singh, Dhami, Kristian Kersting

PDF

Open Access

TL;DR

LogicRank is a neuro-symbolic framework designed to improve the accuracy of reranking generated images from text-to-image models by enhancing logical precision and consistency.

Contribution

It introduces LogicRank, a novel neuro-symbolic reasoning method that enhances reranking accuracy and can be integrated into existing text-to-image generation workflows.

Findings

01

LogicRank outperforms CLIP in reranking accuracy.

02

State-of-the-art models struggle with precise statement generation.

03

LogicRank can be used to fine-tune models for better logical consistency.

Abstract

Text-to-image models have recently achieved remarkable success with seemingly accurate samples in photo-realistic quality. However as state-of-the-art language models still struggle evaluating precise statements consistently, so do language model based image generation processes. In this work we showcase problems of state-of-the-art text-to-image models like DALL-E with generating accurate samples from statements related to the draw bench benchmark. Furthermore we show that CLIP is not able to rerank those generated samples consistently. To this end we propose LogicRank, a neuro-symbolic reasoning framework that can result in a more accurate ranking-system for such precision-demanding settings. LogicRank integrates smoothly into the generation process of text-to-image models and moreover can be used to further fine-tune towards a more logical precise model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Handwritten Text Recognition Techniques · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training