Aetheria: A multimodal interpretable content safety framework based on multi-agent debate and collaboration

Yuxiang He; Jian Zhao; Yuchen Yuan; Tianle Zhang; Wei Cai; Haojie Cheng; Ziyan Shi; Ming Zhu; Haichuan Tang; Chi Zhang; Xuelong Li

arXiv:2512.02530·cs.AI·December 10, 2025

Aetheria: A multimodal interpretable content safety framework based on multi-agent debate and collaboration

Yuxiang He, Jian Zhao, Yuchen Yuan, Tianle Zhang, Wei Cai, Haojie Cheng, Ziyan Shi, Ming Zhu, Haichuan Tang, Chi Zhang, Xuelong Li

PDF

Open Access

TL;DR

Aetheria introduces a multimodal, interpretable content safety framework utilizing multi-agent debate and collaboration, enhancing accuracy and transparency in detecting implicit risks in digital content.

Contribution

The paper presents a novel multi-agent debate framework with RAG-based knowledge retrieval for improved, interpretable content moderation.

Findings

01

Outperforms baselines in content safety accuracy

02

Generates detailed, traceable audit reports

03

Excels in identifying implicit risks

Abstract

The exponential growth of digital content presents significant challenges for content safety. Current moderation systems, often based on single models or fixed pipelines, exhibit limitations in identifying implicit risks and providing interpretable judgment processes. To address these issues, we propose Aetheria, a multimodal interpretable content safety framework based on multi-agent debate and collaboration.Employing a collaborative architecture of five core agents, Aetheria conducts in-depth analysis and adjudication of multimodal content through a dynamic, mutually persuasive debate mechanism, which is grounded by RAG-based knowledge retrieval.Comprehensive experiments on our proposed benchmark (AIR-Bench) validate that Aetheria not only generates detailed and traceable audit reports but also demonstrates significant advantages over baselines in overall content safety accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Topic Modeling