PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting

Qiaowei Miao; JinSheng Quan; Kehan Li; Yawei Luo

arXiv:2405.19957·cs.CV·November 20, 2024

PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting

Qiaowei Miao, JinSheng Quan, Kehan Li, Yawei Luo

PDF

Open Access

TL;DR

PLA4D introduces pixel-level alignment to effectively reconcile motion and geometric priors from multiple diffusion models, enabling high-quality, consistent 4D object generation with reduced optimization time.

Contribution

It proposes a novel pixel-level alignment framework that resolves conflicts between motion and geometry priors in text-to-4D synthesis, improving consistency and efficiency.

Findings

01

Achieves superior geometric, motion, and semantic consistency in 4D generation.

02

Reduces optimization time compared to previous methods.

03

Provides an open-source, accessible tool for 4D content creation.

Abstract

Previous text-to-4D methods have leveraged multiple Score Distillation Sampling (SDS) techniques, combining motion priors from video-based diffusion models (DMs) with geometric priors from multiview DMs to implicitly guide 4D renderings. However, differences in these priors result in conflicting gradient directions during optimization, causing trade-offs between motion fidelity and geometry accuracy, and requiring substantial optimization time to reconcile the models. In this paper, we introduce \textbf{P}ixel-\textbf{L}evel \textbf{A}lignment for text-driven \textbf{4D} Gaussian splatting (PLA4D) to resolve this motion-geometry conflict. PLA4D provides an anchor reference, i.e., text-generated video, to align the rendering process conditioned by different DMs in pixel space. For static alignment, our approach introduces a focal alignment method and Gaussian-Mesh contrastive learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Image and Video Retrieval Techniques

MethodsFocus · ALIGN · Contrastive Learning · Diffusion