TL;DR
This paper introduces a novel method for measuring semantic similarity between texts by comparing the image distributions they evoke using generative models, providing better interpretability and alignment with human scores.
Contribution
It proposes using the Jeffreys divergence between diffusion SDEs induced by texts to quantify semantic similarity based on generated images.
Findings
Aligns well with human-annotated similarity scores.
Offers a new perspective for evaluating text-conditioned generative models.
Enhances interpretability of semantic representations.
Abstract
The semantic similarity between sample expressions measures the distance between their latent 'meaning'. These meanings are themselves typically represented by textual expressions. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke. While this is not possible with humans, generative models allow us to easily visualize and compare generated images, or their distribution, evoked by a textual prompt. Therefore, we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce, or 'conjure.' We show that by choosing the Jeffreys divergence between the reverse-time diffusion stochastic differential equations (SDEs) induced by each textual expression, this can be directly computed via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
