Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu; Youtian Lin; Feihu Zhang; Yifei Zeng; Yikang Yang; Yajie Bao; Jiachen Qian; Siyu Zhu; Xun Cao; Philip Torr; Yao Yao

arXiv:2505.17412·cs.CV·May 27, 2025

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention

Shuang Wu, Youtian Lin, Feihu Zhang, Yifei Zeng, Yikang Yang, Yajie Bao, Jiachen Qian, Siyu Zhu, Xun Cao, Philip Torr, Yao Yao

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

Direct3D-S2 introduces a scalable framework for high-resolution 3D shape generation using sparse volumetric data and a novel Spatial Sparse Attention mechanism, significantly reducing computational costs and enabling gigascale 3D modeling on limited hardware.

Contribution

The paper presents a new sparse volume-based 3D generation framework with Spatial Sparse Attention, improving efficiency and quality over previous methods and making gigascale 3D generation more accessible.

Findings

01

Achieves 3.9x speedup in forward pass and 9.6x in backward pass.

02

Surpasses state-of-the-art in quality and efficiency.

03

Enables training at 1024 resolution with only 8 GPUs.

Abstract

Generating high-resolution 3D shapes using volumetric representations such as Signed Distance Functions (SDFs) presents substantial computational and memory challenges. We introduce Direct3D-S2, a scalable 3D generation framework based on sparse volumes that achieves superior output quality with dramatically reduced training costs. Our key innovation is the Spatial Sparse Attention (SSA) mechanism, which greatly enhances the efficiency of Diffusion Transformer (DiT) computations on sparse volumetric data. SSA allows the model to effectively process large token sets within sparse volumes, substantially reducing computational overhead and achieving a 3.9x speedup in the forward pass and a 9.6x speedup in the backward pass. Our framework also includes a variational autoencoder (VAE) that maintains a consistent sparse volumetric format across input, latent, and output stages. Compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DreamTechAI/Direct3D-S2
pytorch

Models

🤗
wushuang98/Direct3D-S2
model· 33 dl· ♡ 79
33 dl♡ 79

Videos

Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Computer Graphics and Visualization Techniques

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Diffusion · Position-Wise Feed-Forward Layer · Absolute Position Encodings