Beyond Imperfect Alternatives with Rulemapping: A Neuro-Symbolic Case Study on Online Hate Speech
Oskar von Cossel

TL;DR
This study explores a neuro-symbolic approach combining large language models with symbolic logic to improve online hate speech classification, aiming for transparent and legally compliant moderation.
Contribution
It introduces a hybrid Rulemapping method that constrains LLMs within symbolic structures, enhancing legal decision accuracy and verifiability in content moderation.
Findings
Rulemapping achieves high recall (0.82-0.89) and precision (0.80-0.86) across LLMs.
Unconstrained prompting yields lower precision (0.34-0.49).
Symbolic scaffolds enable robust, auditable legal automation.
Abstract
Automating legal reasoning forces a choice between imperfect alternatives: symbolic systems offer transparency but struggle with ambiguity, whereas neural systems handle natural language flexibly but lack verifiability. This paper investigates whether a hybrid, neuro-symbolic approach can reconcile this trade-off. We evaluate this architecture in the domain of online content moderation, which serves as a proxy for high-volume legal decision-making such as mass administrative proceedings. In these settings, operators must assess thousands of cases daily under strict legal standards. Specifically, we examine whether constraining large language models (LLMs) within deterministic symbolic scaffolds improves statute-grounded illegality assessment while preventing "scope drift" (where LLMs conflate moral offensiveness with legal illegality). We evaluate the neuro-symbolic variant of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
