TL;DR
Voxify3D introduces a novel differentiable framework that combines 3D mesh optimization with pixel art supervision to generate stylized voxel art with semantic and aesthetic fidelity.
Contribution
It presents a unique integration of orthographic supervision, CLIP alignment, and palette-constrained quantization for improved voxel art generation from meshes.
Findings
Achieved 37.12 CLIP-IQA score and 77.90% user preference in experiments.
Supported controllable abstraction with 2-8 colors and 20x-50x resolutions.
Abstract
Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstraction, semantic preservation, and discrete color coherence. Existing methods either over-simplify geometry or fail to achieve the pixel-precise, palette-constrained aesthetics of voxel art. We introduce Voxify3D, a differentiable two-stage framework bridging 3D mesh optimization with 2D pixel art supervision. Our core innovation lies in the synergistic integration of three components: (1) orthographic pixel art supervision that eliminates perspective distortion for precise voxel-pixel alignment; (2) patch-based CLIP alignment that preserves semantics across discretization levels; (3) palette-constrained Gumbel-Softmax quantization enabling differentiable optimization over discrete color spaces with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
