Bridging the Intention-Expression Gap: Aligning Multi-Dimensional Preferences via Hierarchical Relevance Feedback in Text-to-Image Diffusion

Wenxi Wang; Hongbin Liu; Mingqian Li; Junyan Yuan; Junqi Zhang

arXiv:2603.14936·cs.CV·May 19, 2026

Bridging the Intention-Expression Gap: Aligning Multi-Dimensional Preferences via Hierarchical Relevance Feedback in Text-to-Image Diffusion

Wenxi Wang, Hongbin Liu, Mingqian Li, Junyan Yuan, Junqi Zhang

PDF

TL;DR

This paper introduces HRFD, a training-free, hierarchical relevance feedback framework that aligns multi-dimensional visual preferences in text-to-image diffusion, reducing cognitive load and improving image relevance.

Contribution

The paper presents a novel hierarchical relevance feedback method that decouples multi-dimensional preference inference and employs statistical measures, operating entirely outside the model training process.

Findings

01

HRFD outperforms baseline methods in capturing user intent.

02

It effectively handles conflicting multi-dimensional preferences.

03

The framework is model-agnostic and requires no additional training.

Abstract

Users often possess a clear visual intent but struggle to articulate it precisely in language. This intention-expression gap makes aligning generated images with latent visual preferences a fundamental challenge in text-to-image diffusion models. Existing methods either require model training, sacrificing flexibility, or rely on textual feedback, imposing a heavy cognitive burden. Although recent training-free methods use click-based binary preference feedback to reduce user effort, they force Foundation Models (FMs) to infer preferences at the semantic level. When faced with multi-dimensional preferences, FMs suffer from inference overload and fail to identify exact preferred feature values under conflicting user signals. Consequently, a flexible framework for multi-dimensional feature alignment remains absent. To address this, we propose a Hierarchical Relevance Feedback-Driven (HRFD)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.