From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation

Tianle Gu; Kexin Huang; Lingyu Li; Ruilin Luo; Shiyang Huang; Zongqi Wang; Yujiu Yang; Yan Teng; Yingchun Wang

arXiv:2602.02536·cs.LG·February 4, 2026

From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation

Tianle Gu, Kexin Huang, Lingyu Li, Ruilin Luo, Shiyang Huang, Zongqi Wang, Yujiu Yang, Yan Teng, Yingchun Wang

PDF

Open Access 4 Models 1 Datasets

TL;DR

This paper introduces UniMod, a novel multimodal safety moderation framework that employs dense reasoning trajectories and multi-attribute supervision to improve detection of harmful content with less training data.

Contribution

The paper proposes UniMod, a multi-attribute trajectory paradigm with structured reasoning and a multi-head reward model, advancing multimodal moderation beyond binary labels.

Findings

01

Achieves competitive textual moderation performance

02

Sets a new multimodal benchmark with less than 40% of training data

03

Validates effectiveness of multi-attribute trajectory reasoning

Abstract

Safety moderation is pivotal for identifying harmful content. Despite the success of textual safety moderation, its multimodal counterparts remain hindered by a dual sparsity of data and supervision. Conventional reliance on binary labels lead to shortcut learning, which obscures the intrinsic classification boundaries necessary for effective multimodal discrimination. Hence, we propose a novel learning paradigm (UniMod) that transitions from sparse decision-making to dense reasoning traces. By constructing structured trajectories encompassing evidence grounding, modality assessment, risk mapping, policy decision, and response generation, we reformulate monolithic decision tasks into a multi-dimensional boundary learning process. This approach forces the model to ground its decision in explicit safety semantics, preventing the model from converging on superficial shortcuts. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Carol0110/UniReward
dataset· 27 dl
27 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning · Topic Modeling