Gen-AI Police Sketches with Stable Diffusion

Nicholas Fidalgo; Aaron Contreras; Katherine Harvey; Johnny Ni

arXiv:2507.18667·cs.CV·July 28, 2025

Gen-AI Police Sketches with Stable Diffusion

Nicholas Fidalgo, Aaron Contreras, Katherine Harvey, Johnny Ni

PDF

Open Access

TL;DR

This paper explores AI-driven suspect sketching using Stable Diffusion and CLIP, comparing three models and demonstrating that a simple image-to-image approach yields the most structurally accurate sketches.

Contribution

It introduces a novel LoRA fine-tuning method for CLIP within Stable Diffusion, enhancing text-to-sketch alignment in suspect image generation.

Findings

01

Model 1 achieved SSIM of 0.72 and PSNR of 25 dB

02

Fine-tuning both self- and cross-attention layers improved alignment

03

Model 1 produced the clearest facial features in sketches

Abstract

This project investigates the use of multimodal AI-driven approaches to automate and enhance suspect sketching. Three pipelines were developed and evaluated: (1) baseline image-to-image Stable Diffusion model, (2) same model integrated with a pre-trained CLIP model for text-image alignment, and (3) novel approach incorporating LoRA fine-tuning of the CLIP model, applied to self-attention and cross-attention layers, and integrated with Stable Diffusion. An ablation study confirmed that fine-tuning both self- and cross-attention layers yielded the best alignment between text descriptions and sketches. Performance testing revealed that Model 1 achieved the highest structural similarity (SSIM) of 0.72 and a peak signal-to-noise ratio (PSNR) of 25 dB, outperforming Model 2 and Model 3. Iterative refinement enhanced perceptual similarity (LPIPS), with Model 3 showing improvement over Model 2…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Face Recognition and Perception · Generative Adversarial Networks and Image Synthesis