Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling
Anqi Li, Wenwei Jin, Jintao Tong, Pengda Qin, Weijia Li, Guo Lu

TL;DR
This paper introduces Hi-Guard, a hierarchical, policy-aligned multimodal moderation framework that improves accuracy, interpretability, and policy compliance in content moderation systems.
Contribution
The paper proposes a novel hierarchical moderation pipeline with policy-aligned prompts and a multi-level reward, enhancing transparency and effectiveness over existing methods.
Findings
Achieves superior classification accuracy and generalization.
Enhances interpretability and explanation quality.
Demonstrates effectiveness in real-world deployment.
Abstract
Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, label-driven learning, lacking alignment with moderation rules and producing opaque decisions that hinder human review. Therefore, we propose Hierarchical Guard (Hi-Guard), a multimodal moderation framework that introduces a new policy-aligned decision paradigm. The term "Hierarchical" reflects two key aspects of our system design: (1) a hierarchical moderation pipeline, where a lightweight binary model first filters safe content and a stronger model handles fine-grained risk classification; and (2) a hierarchical taxonomy in the second stage, where the model performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
