Efficient LLM Moderation with Multi-Layer Latent Prototypes

Maciej Chrab\k{a}szcz; Filip Szatkowski; Bartosz W\'ojcik; Jan Dubi\'nski; Tomasz Trzci\'nski; Sebastian Cygert

arXiv:2502.16174·cs.LG·February 9, 2026

Efficient LLM Moderation with Multi-Layer Latent Prototypes

Maciej Chrab\k{a}szcz, Filip Szatkowski, Bartosz W\'ojcik, Jan Dubi\'nski, Tomasz Trzci\'nski, Sebastian Cygert

PDF

Open Access

TL;DR

The paper introduces MLPM, a lightweight, multi-layer prototype-based moderation tool that enhances safety and efficiency in deploying large language models, outperforming existing methods across various benchmarks.

Contribution

We propose a novel multi-layer prototype approach for LLM moderation that is highly customizable, efficient, and easily integrable into existing pipelines.

Findings

01

Achieves state-of-the-art moderation performance

02

Maintains high efficiency with negligible overhead

03

Scales effectively across different model sizes

Abstract

Although modern LLMs are aligned with human values during post-training, robust moderation remains essential to prevent harmful outputs at deployment time. Existing approaches suffer from performance-efficiency trade-offs and are difficult to customize to user-specific requirements. Motivated by this gap, we introduce Multi-Layer Prototype Moderator (MLPM), a lightweight and highly customizable input moderation tool. We propose leveraging prototypes of intermediate representations across multiple layers to improve moderation quality while maintaining high efficiency. By design, our method adds negligible overhead to the generation pipeline and can be seamlessly applied to any model. MLPM achieves state-of-the-art performance on diverse moderation benchmarks and demonstrates strong scalability across model families of various sizes. Moreover, we show that it integrates smoothly into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Law · Natural Language Processing Techniques