CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Bingzhe Wu; Haotian Lu; Yuchen Mou

arXiv:2604.10504·cs.AI·April 14, 2026

CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Bingzhe Wu, Haotian Lu, Yuchen Mou

PDF

TL;DR

CARO is a novel two-stage training framework that enhances large language models' ability to perform robust analogical reasoning for content moderation, significantly reducing harmful decision shortcuts.

Contribution

Introduces a two-stage training method combining retrieval-augmented generation and preference optimization to improve LLM reasoning in moderation tasks.

Findings

01

Outperforms state-of-the-art reasoning and moderation models.

02

Achieves 24.9% higher F1 score on ambiguous moderation benchmarks.

03

Effectively mitigates harmful decision shortcuts in LLMs.

Abstract

Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce \caro (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, \caro bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, \caro dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.