Bridging the Intention-Expression Gap: Aligning Multi-Dimensional Preferences via Hierarchical Relevance Feedback in Text-to-Image Diffusion
Wenxi Wang, Hongbin Liu, Mingqian Li, Junyan Yuan, Junqi Zhang

TL;DR
This paper introduces HRFD, a training-free, hierarchical relevance feedback framework that aligns multi-dimensional visual preferences in text-to-image diffusion, reducing cognitive load and improving image relevance.
Contribution
The paper presents a novel hierarchical relevance feedback method that decouples multi-dimensional preference inference and employs statistical measures, operating entirely outside the model training process.
Findings
HRFD outperforms baseline methods in capturing user intent.
It effectively handles conflicting multi-dimensional preferences.
The framework is model-agnostic and requires no additional training.
Abstract
Users often possess a clear visual intent but struggle to articulate it precisely in language. This intention-expression gap makes aligning generated images with latent visual preferences a fundamental challenge in text-to-image diffusion models. Existing methods either require model training, sacrificing flexibility, or rely on textual feedback, imposing a heavy cognitive burden. Although recent training-free methods use click-based binary preference feedback to reduce user effort, they force Foundation Models (FMs) to infer preferences at the semantic level. When faced with multi-dimensional preferences, FMs suffer from inference overload and fail to identify exact preferred feature values under conflicting user signals. Consequently, a flexible framework for multi-dimensional feature alignment remains absent. To address this, we propose a Hierarchical Relevance Feedback-Driven (HRFD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
