DreamCS: Geometry-Aware Text-to-3D Generation with Unpaired 3D Reward Supervision
Xiandong Zou, Ruihao Xia, Hongsong Wang, Pan Zhou

TL;DR
DreamCS introduces a novel framework for text-to-3D generation that leverages unpaired 3D preference data and a new reward model to produce geometrically accurate and human-aligned 3D assets.
Contribution
It develops the first large-scale unpaired 3D preference dataset and a reward model trained directly on this data, improving human preference alignment in 3D generation.
Findings
Outperforms prior methods in producing human-preferred 3D assets.
Effectively learns human-aligned 3D geometric preferences without paired comparisons.
Enhances both implicit and explicit 3D generation quality.
Abstract
While text-to-3D generation has attracted growing interest, existing methods often struggle to produce 3D assets that align well with human preferences. Current preference alignment techniques for 3D content typically rely on hardly-collected preference-paired multi-view 2D images to train 2D reward models, when then guide 3D generation -- leading to geometric artifacts due to their inherent 2D bias. To address these limitations, we construct 3D-MeshPref, the first large-scale unpaired 3D preference dataset, featuring diverse 3D meshes annotated by a large language model and refined by human evaluators. We then develop RewardCS, the first reward model trained directly on unpaired 3D-MeshPref data using a novel Cauchy-Schwarz divergence objective, enabling effective learning of human-aligned 3D geometric preferences without requiring paired comparisons. Building on this, we propose…
Peer Reviews
Decision·ICLR 2026 Poster
- The Cauchy-Schwarz divergence training approach for unpaired preference data offers a potentially generalizable framework applicable beyond 3D generation tasks. - 3D-MeshPref provides a human-verified dataset of diverse unpaired 3D meshes, which may be a good community resource that helps reduces dependence on expensive paired annotations. - The differentiable meshization pipeline enables end-to-end optimization with geometry-aware supervision, demonstrating technical soundness in integrating
- The CS divergence receives extensive theoretical treatment (Appendix B) but lacks empirical validation. No ablations demonstrate performance degradation without this loss, no clustering baselines justify its necessity, and Table 4's λ variations don't compare against removing the term entirely. The mathematical formalism appears to add complexity without proven practical benefit. I ask the authors to further elaborate upon this point. - Table 1 exposes a critical flaw: RewardCS underperforms R
* The paper correctly diagnoses a key failure mode of 2D preference signals in 3D generation and proposes a principled 3D reward to address it. * The paper provides a formal mathematical justification, establishing the asymptotic equivalence between the proposed unpaired objective (using CS divergence) and traditional paired supervision, which instills high confidence in the method's soundness. * Building 3D‑MeshPref at 30k+ meshes with human‑verified thresholds, is a non‑trivial engineering con
* A primary concern is the use of the "GA" (3D Geometry-Asset Alignment Reward) metric. The authors state (Section 4, Appendix F.1) that this metric is "based on RewardCS" and "derived from RewardCS." Using a variant of their own proposed model as a key evaluation metric creates a significant risk of "metric-method coupling," where the metric may be inherently biased to favor the architecture and training objective of the method being tested. This potential bias makes the "GA" scores in Table 1
1. The proposed RewardCS model directly tackles the fundamental issue of 2D bias in existing text-to-3D preference alignment methods, which leads to geometric artifacts like the Janus problem. The 3D-geometric aware model also bypasses the need for hard-to-collect paired preference data. 2. The use of Cauchy-Schwarz divergence for unpaired preference learning is both effective in practice and supported by a solid theoretical proof of its equivalence to paired learning. 3. The method is shown to
1. The entire DreamCS framework is built upon the increasingly dated Score Distillation Sampling (SDS) paradigm, which is slow, optimization-based, and prone to artifacts. The field is rapidly moving towards fast, feed-forward text-to-3D generators (e.g., Trellis). A more forward-looking and potentially more effective approach for preference alignment would be to directly fine-tune these feed-forward models using human preferences, rather than adding a complex reward guidance mechanism to a slow
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Interactive and Immersive Displays
MethodsALIGN
