Human-Aligned Generative Perception: Bridging Psychophysics and Generative Models
Antara Titikhsha, Om Kulkarni, Dharun Muthaiah

TL;DR
This paper introduces a lightweight human perception model to guide generative image models, enhancing geometric accuracy and semantic alignment without additional training.
Contribution
It proposes a Human Perception Embedding teacher that guides diffusion models to better incorporate geometric constraints and human-like shape understanding.
Findings
Improved geometric control in image generation.
Achieved 80% better semantic alignment.
Enabled zero-shot transfer of complex shapes.
Abstract
Text-to-image diffusion models generate highly detailed textures, yet they often rely on surface appearance and fail to follow strict geometric constraints, particularly when those constraints conflict with the style implied by the text prompt. This reflects a broader semantic gap between human perception and current generative models. We investigate whether geometric understanding can be introduced without specialized training by using lightweight, off-the-shelf discriminators as external guidance signals. We propose a Human Perception Embedding (HPE) teacher trained on the THINGS triplet dataset, which captures human sensitivity to object shape. By injecting gradients from this teacher into the latent diffusion process, we show that geometry and style can be separated in a controllable manner. We evaluate this approach across three architectures: Stable Diffusion v1.5 with a U-Net…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Aesthetic Perception and Analysis
