EG4D: Explicit Generation of 4D Object without Score Distillation
Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang, Zhou, Jing Liao, Houqiang Li

TL;DR
EG4D introduces a multi-stage framework that explicitly generates high-quality, temporally consistent 4D objects from a single image without relying on score distillation, addressing key challenges in dynamic 3D asset synthesis.
Contribution
It presents DG4D, a novel multi-stage method combining attention injection, Gaussian Splatting, and diffusion priors to improve 4D object generation quality and consistency.
Findings
Outperforms baselines in generation quality
Produces temporally consistent multi-view videos
Reduces artifacts like over-saturation and Janus problem
Abstract
In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that…
Peer Reviews
Decision·ICLR 2025 Poster
1. EG4D introduces a unique multi-stage approach that successfully avoids issues associated with score distillation, such as over-saturation and Janus artifacts. 2. The attention injection mechanism helps ensure temporal consistency. 3. Quantitative and qualitative results demonstrate that EG4D produces high-quality 4D content, achieving superior alignment with the reference view and more realistic motion realism compared to baselines like DreamGaussian4D and Animate124. 4. The evaluation is com
1. Lack of baseline methods. Recent approaches such as L4GM and efficient4D ( which also leverage 4D Gaussian Splatting) should definitely be included for comparison. 2. Lack of Comparative Analysis. The attention injection technique is novel, but without comparisons to other methods, its effectiveness remains uncertain. 3. Texture Inconsistencies. Despite color transformation, subtle color and texture shifts remain, questioning the robustness of temporal consistency. 4. Limited Support for High
- A multiscale augmentation is proposed for rendering more details. - A training-free attention injection strategy is introduced to ensure consistency. - A Diffusion Refinement stage is included to refine the details.
- In Fig. 1, it seems that the multiscale renderer is not used for the "Diffusion Refinement" stage. What is the motivation for this change? Is it necessary? - What is the difference of the proposed Diffusion Refinement stage and that in DreamGaussian4D?
1. The proposed pipeline is clearly motivated and reasonable. 2. The writing is clear and easy to follow, with a well-structured introduction and related work section 3. Extensive experiments shows clear performance improvements over previous methods (Table 1, Table 2). 4. The authors provide a comprehensive ablation study, which helps justify the effectiveness of attention injection, color transformation, multi-scale renderer, refinement strategies. 5. The supplementary materials are sufficient
1. As shown in the video, the results are flickering and blurry. It seems that the quality is not as good as Diffusion4D [1] (see the examples of the project page of Diffusion4D). It would be valuable to add a discussion and analysis about the proposed method with Diffusion4D (e.g., generation time, temporal consistency, image fidelity, 3D consistency). 2. The method is built on SVD and SV3D for 4D generation. It would strengthen the paper if the authors could discuss the different designs bet
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInteractive and Immersive Displays · Augmented Reality Applications · Constraint Satisfaction and Optimization
MethodsDiffusion
