Txt2Img-MHN: Remote Sensing Image Generation from Text Using Modern Hopfield Networks
Yonghao Xu, Weikang Yu, Pedram Ghamisi, Michael Kopp, and Sepp, Hochreiter

TL;DR
This paper introduces Txt2Img-MHN, a novel hierarchical Hopfield network approach for generating realistic remote sensing images from text descriptions, improving semantic accuracy and image quality over existing methods.
Contribution
The paper proposes a hierarchical prototype learning framework using modern Hopfield layers for text-to-image remote sensing synthesis, which enhances semantic representation and image realism.
Findings
Outperforms existing methods in generating realistic remote sensing images
Zero-shot classification accuracy indicates high semantic consistency
Demonstrates effectiveness on benchmark remote sensing datasets
Abstract
The synthesis of high-resolution remote sensing images based on text descriptions has great potential in many practical application scenarios. Although deep neural networks have achieved great success in many important remote sensing tasks, generating realistic remote sensing images from text descriptions is still very difficult. To address this challenge, we propose a novel text-to-image modern Hopfield network (Txt2Img-MHN). The main idea of Txt2Img-MHN is to conduct hierarchical prototype learning on both text and image embeddings with modern Hopfield layers. Instead of directly learning concrete but highly diverse text-image joint feature representations for different semantics, Txt2Img-MHN aims to learn the most representative prototypes from text-image embeddings, achieving a coarse-to-fine learning strategy. These learned prototypes can then be utilized to represent more complex…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
