PRJ: Perception-Retrieval-Judgement for Generated Images
Qiang Fu, Zonglei Jing, Zonghao Ying, Xiaoqian Li

TL;DR
PRJ is a structured, cognitively inspired framework for detecting nuanced and implicit harms in AI-generated images, improving interpretability and accuracy over existing safety systems.
Contribution
It introduces a three-stage perception-retrieval-judgement process that models toxicity detection as structured reasoning, enhancing interpretability and handling nuanced harms.
Findings
PRJ outperforms existing safety checkers in detection accuracy.
PRJ provides improved robustness against adversarial manipulations.
PRJ enables categorical and interpretative analysis of harms.
Abstract
The rapid progress of generative AI has enabled remarkable creative capabilities, yet it also raises urgent concerns regarding the safety of AI-generated visual content in real-world applications such as content moderation, platform governance, and digital media regulation. This includes unsafe material such as sexually explicit images, violent scenes, hate symbols, propaganda, and unauthorized imitations of copyrighted artworks. Existing image safety systems often rely on rigid category filters and produce binary outputs, lacking the capacity to interpret context or reason about nuanced, adversarially induced forms of harm. In addition, standard evaluation metrics (e.g., attack success rate) fail to capture the semantic severity and dynamic progression of toxicity. To address these limitations, we propose Perception-Retrieval-Judgement (PRJ), a cognitively inspired framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
