Controllable Skin Synthesis via Lesion-Focused Vector Autoregression Model
Jiajun Sun, Zhen Yu, Siyuan Yan, Jason J. Ong, Zongyuan Ge, Lei Zhang

TL;DR
This paper introduces LF-VAR, a novel model for controllable, high-fidelity skin image synthesis that incorporates lesion measurements and types, significantly improving image quality and control over lesion features compared to prior methods.
Contribution
LF-VAR is the first model to integrate lesion measurements and types into a structured tokenization and autoregressive framework for controllable skin image synthesis.
Findings
Achieved the best FID score of 0.74 among seven lesion types
Improved synthesis quality over previous state-of-the-art by 6.3%
Enabled control over lesion location and type using language prompts
Abstract
Skin images from real-world clinical practice are often limited, resulting in a shortage of training data for deep-learning models. While many studies have explored skin image synthesis, existing methods often generate low-quality images and lack control over the lesion's location and type. To address these limitations, we present LF-VAR, a model leveraging quantified lesion measurement scores and lesion type labels to guide the clinically relevant and controllable synthesis of skin images. It enables controlled skin synthesis with specific lesion characteristics based on language prompts. We train a multiscale lesion-focused Vector Quantised Variational Auto-Encoder (VQVAE) to encode images into discrete latent representations for structured tokenization. Then, a Visual AutoRegressive (VAR) Transformer trained on tokenized representations facilitates image synthesis. Lesion measurement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
