Controllable Skin Synthesis via Lesion-Focused Vector Autoregression Model

Jiajun Sun; Zhen Yu; Siyuan Yan; Jason J. Ong; Zongyuan Ge; Lei Zhang

arXiv:2508.19626·cs.CV·August 28, 2025

Controllable Skin Synthesis via Lesion-Focused Vector Autoregression Model

Jiajun Sun, Zhen Yu, Siyuan Yan, Jason J. Ong, Zongyuan Ge, Lei Zhang

PDF

TL;DR

This paper introduces LF-VAR, a novel model for controllable, high-fidelity skin image synthesis that incorporates lesion measurements and types, significantly improving image quality and control over lesion features compared to prior methods.

Contribution

LF-VAR is the first model to integrate lesion measurements and types into a structured tokenization and autoregressive framework for controllable skin image synthesis.

Findings

01

Achieved the best FID score of 0.74 among seven lesion types

02

Improved synthesis quality over previous state-of-the-art by 6.3%

03

Enabled control over lesion location and type using language prompts

Abstract

Skin images from real-world clinical practice are often limited, resulting in a shortage of training data for deep-learning models. While many studies have explored skin image synthesis, existing methods often generate low-quality images and lack control over the lesion's location and type. To address these limitations, we present LF-VAR, a model leveraging quantified lesion measurement scores and lesion type labels to guide the clinically relevant and controllable synthesis of skin images. It enables controlled skin synthesis with specific lesion characteristics based on language prompts. We train a multiscale lesion-focused Vector Quantised Variational Auto-Encoder (VQVAE) to encode images into discrete latent representations for structured tokenization. Then, a Visual AutoRegressive (VAR) Transformer trained on tokenized representations facilitates image synthesis. Lesion measurement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.