CompAgent: An Agentic Framework for Visual Compliance Verification

Rahul Ghosh; Baishali Chaudhury; Hari Prasanna Das; Meghana Ashok; Ryan Razkenari; Long Chen; Sungmin Hong; Chun-Hao Liu

arXiv:2511.00171·cs.CV·March 23, 2026

CompAgent: An Agentic Framework for Visual Compliance Verification

Rahul Ghosh, Baishali Chaudhury, Hari Prasanna Das, Meghana Ashok, Ryan Razkenari, Long Chen, Sungmin Hong, Chun-Hao Liu

PDF

Open Access

TL;DR

CompAgent introduces an agentic framework that combines multimodal large language models with visual tools and planning to improve the accuracy and scalability of visual compliance verification in complex policy domains.

Contribution

This work presents the first agentic framework that integrates MLLMs with visual tools and dynamic planning for effective visual compliance verification.

Findings

01

Achieves up to 76% F1 score on public benchmarks.

02

Outperforms specialized classifiers and baseline methods.

03

Improves state-of-the-art performance by 10% on UnsafeBench.

Abstract

Visual compliance verification is a critical yet underexplored problem in computer vision, especially in domains such as media, entertainment, and advertising where content must adhere to complex and evolving policy rules. Existing methods often rely on task-specific deep learning models trained on manually labeled datasets, which are costly to build and limited in generalizability. While recent Multimodal Large Language Models (MLLMs) offer broad real-world knowledge and policy understanding, they struggle to reason over fine-grained visual details and apply structured compliance rules effectively on their own. In this paper, we propose CompAgent, the first agentic framework for visual compliance verification. CompAgent augments MLLMs with a suite of visual tools-such as object detectors, face analyzers, NSFW detectors, and captioning models-and introduces a planning agent that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis