Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling

Anqi Li; Wenwei Jin; Jintao Tong; Pengda Qin; Weijia Li; Guo Lu

arXiv:2508.03296·cs.CL·January 9, 2026

Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasoning and Hierarchical Labeling

Anqi Li, Wenwei Jin, Jintao Tong, Pengda Qin, Weijia Li, Guo Lu

PDF

TL;DR

This paper introduces Hi-Guard, a hierarchical, policy-aligned multimodal moderation framework that improves accuracy, interpretability, and policy compliance in content moderation systems.

Contribution

The paper proposes a novel hierarchical moderation pipeline with policy-aligned prompts and a multi-level reward, enhancing transparency and effectiveness over existing methods.

Findings

01

Achieves superior classification accuracy and generalization.

02

Enhances interpretability and explanation quality.

03

Demonstrates effectiveness in real-world deployment.

Abstract

Social platforms have revolutionized information sharing, but also accelerated the dissemination of harmful and policy-violating content. To ensure safety and compliance at scale, moderation systems must go beyond efficiency and offer accuracy and interpretability. However, current approaches largely rely on noisy, label-driven learning, lacking alignment with moderation rules and producing opaque decisions that hinder human review. Therefore, we propose Hierarchical Guard (Hi-Guard), a multimodal moderation framework that introduces a new policy-aligned decision paradigm. The term "Hierarchical" reflects two key aspects of our system design: (1) a hierarchical moderation pipeline, where a lightweight binary model first filters safe content and a stronger model handles fine-grained risk classification; and (2) a hierarchical taxonomy in the second stage, where the model performs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.