IPAD-CLIP: Teaching CLIP to Detect Image Local Perceptual Artifacts
Juan Wang, Xinyu Sun, Ke Zhang, Jin Wang, Bing Li, Weiming Hu, Liang Wang

TL;DR
This paper introduces IPAD-CLIP, a novel framework based on CLIP for detecting local perceptual artifacts in images, supported by a new benchmark dataset and demonstrating superior performance over existing methods.
Contribution
The paper formalizes the IPAD task, provides a new dataset with pixel-level masks for artifacts, and develops IPAD-CLIP, a CLIP-based model that improves local artifact detection.
Findings
IPAD-CLIP significantly outperforms existing anomaly detection methods.
The dataset includes 3,520 images with pixel-level artifact masks.
Local artifacts are better detected using artifact-aware text embeddings.
Abstract
Current image quality assessment methods are heavily biased towards global distortions (e.g., noise, blur), neglecting local perceptual artifacts such as ghosting, lens flare, and moire effects. Although significant progress has been made in artifact removal, the fundamental problem of automatic artifact detection remains largely unexplored. In this paper, we formalize the Image Perceptual Artifact Detection (IPAD) task to address this gap. We contribute a benchmark dataset comprising 3,520 artifact images, including 520 real-captured and 3,000 synthetic samples, each paired with pixel-level masks across three representative artifact categories. The core challenge of IPAD lies in the localized, subtle, and semantically weak nature of these artifacts, which makes them prone to missed detection. To overcome this, we introduce IPAD-CLIP, a novel framework built upon CLIP that enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
