BLOCK: An Open-Source Bi-Stage MLLM Character-to-Skin Pipeline for Minecraft
Hengquan Guo

TL;DR
BLOCK is an open-source pipeline that generates Minecraft skins from character concepts by combining a 3D preview synthesis with a skin decoding stage, utilizing multimodal models and a progressive LoRA curriculum.
Contribution
It introduces a novel bi-stage pipeline with a 3D preview synthesis and skin decoding, along with EvolveLoRA for improved stability and efficiency in character-to-skin generation.
Findings
Produces pixel-perfect Minecraft skins from arbitrary concepts.
Utilizes a large multimodal model for consistent 3D preview synthesis.
Employs a fine-tuned FLUX.2 model for accurate skin decoding.
Abstract
We present \textbf{BLOCK}, an open-source bi-stage character-to-skin pipeline that generates pixel-perfect Minecraft skins from arbitrary character concepts. BLOCK decomposes the problem into (i) a \textbf{3D preview synthesis stage} driven by a large multimodal model (MLLM) with a carefully designed prompt-and-reference template, producing a consistent dual-panel (front/back) oblique-view Minecraft-style preview; and (ii) a \textbf{skin decoding stage} based on a fine-tuned FLUX.2 model that translates the preview into a skin atlas image. We further propose \textbf{EvolveLoRA}, a progressive LoRA curriculum (text-to-image image-to-image preview-to-skin) that initializes each phase from the previous adapter to improve stability and efficiency. BLOCK is released with all prompt templates and fine-tuned weights to support reproducible character-to-skin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
