PRJ: Perception-Retrieval-Judgement for Generated Images

Qiang Fu; Zonglei Jing; Zonghao Ying; Xiaoqian Li

arXiv:2506.03683·cs.CV·June 5, 2025

PRJ: Perception-Retrieval-Judgement for Generated Images

Qiang Fu, Zonglei Jing, Zonghao Ying, Xiaoqian Li

PDF

Open Access

TL;DR

PRJ is a structured, cognitively inspired framework for detecting nuanced and implicit harms in AI-generated images, improving interpretability and accuracy over existing safety systems.

Contribution

It introduces a three-stage perception-retrieval-judgement process that models toxicity detection as structured reasoning, enhancing interpretability and handling nuanced harms.

Findings

01

PRJ outperforms existing safety checkers in detection accuracy.

02

PRJ provides improved robustness against adversarial manipulations.

03

PRJ enables categorical and interpretative analysis of harms.

Abstract

The rapid progress of generative AI has enabled remarkable creative capabilities, yet it also raises urgent concerns regarding the safety of AI-generated visual content in real-world applications such as content moderation, platform governance, and digital media regulation. This includes unsafe material such as sexually explicit images, violent scenes, hate symbols, propaganda, and unauthorized imitations of copyrighted artworks. Existing image safety systems often rely on rigid category filters and produce binary outputs, lacking the capacity to interpret context or reason about nuanced, adversarially induced forms of harm. In addition, standard evaluation metrics (e.g., attack success rate) fail to capture the semantic severity and dynamic progression of toxicity. To address these limitations, we propose Perception-Retrieval-Judgement (PRJ), a cognitively inspired framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis