Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin; Wenbo Hu; Jiale Xu; Ying Shan; Kai Han

arXiv:2604.21592·cs.CV·April 24, 2026

Sculpt4D: Generating 4D Shapes via Sparse-Attention Diffusion Transformers

Minghao Yin, Wenbo Hu, Jiale Xu, Ying Shan, Kai Han

PDF

TL;DR

Sculpt4D introduces a novel 4D generative framework that efficiently models complex dynamic shapes by integrating sparse-attention diffusion transformers, significantly reducing computational costs while maintaining high fidelity.

Contribution

It proposes a Block Sparse Attention mechanism within a pretrained 3D Diffusion Transformer to enable scalable, high-quality 4D shape synthesis with temporal coherence.

Findings

01

Achieves state-of-the-art results in 4D shape generation.

02

Reduces network computation by 56% compared to full attention.

03

Models complex spatiotemporal dependencies with high fidelity.

Abstract

Recent breakthroughs in 3D generative modeling have yielded remarkable progress in static shape synthesis, yet high-fidelity dynamic 4D generation remains elusive, hindered by temporal artifacts and prohibitive computational demand. We present Sculpt4D, a native 4D generative framework that seamlessly integrates efficient temporal modeling into a pretrained 3D Diffusion Transformer (Hunyuan3D 2.1), thereby mitigating the scarcity of 4D training data. At its core lies a Block Sparse Attention mechanism that preserves object identity by anchoring to the initial frame while capturing rich motion dynamics via a time-decaying sparse mask. This design faithfully models complex spatiotemporal dependencies with high fidelity, while sidestepping the quadratic overhead of full attention and reducing network total computation by 56%. Consequently, Sculpt4D establishes a new state-of-the-art in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.