Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference

Jiaqi Leng; Shuyuan Tu; Haidong Cao; Sicheng Xie; Daoguo Dong; Zuxuan Wu; Yu-Gang Jiang

arXiv:2603.01594·cs.CV·March 3, 2026

Preference Score Distillation: Leveraging 2D Rewards to Align Text-to-3D Generation with Human Preference

Jiaqi Leng, Shuyuan Tu, Haidong Cao, Sicheng Xie, Daoguo Dong, Zuxuan Wu, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces Preference Score Distillation (PSD), a novel framework that uses pretrained 2D reward models and classifier-free guidance to improve human preference alignment in text-to-3D generation without requiring 3D training data.

Contribution

The paper proposes a new optimization-based method that leverages 2D reward models and CFG-style mechanisms for human-aligned text-to-3D synthesis, addressing data scarcity and performance constraints.

Findings

01

PSD outperforms existing methods in aesthetic metrics.

02

It seamlessly integrates with various pipelines.

03

The approach demonstrates strong extensibility.

Abstract

Human preference alignment presents a critical yet underexplored challenge for diffusion models in text-to-3D generation. Existing solutions typically require task-specific fine-tuning, posing significant hurdles in data-scarce 3D domains. To address this, we propose Preference Score Distillation (PSD), an optimization-based framework that leverages pretrained 2D reward models for human-aligned text-to-3D synthesis without 3D training data. Our key insight stems from the incompatibility of pixel-level gradients: due to the absence of noisy samples during reward model training, direct application of 2D reward gradients disturbs the denoising process. Noticing that similar issue occurs in the naive classifier guidance in conditioned diffusion models, we fundamentally rethink preference alignment as a classifier-free guidance (CFG)-style mechanism through our implicit reward model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis