JurEE not Judges: safeguarding llm interactions with small, specialised   Encoder Ensembles

Dom Nasrabadi

arXiv:2410.08442·cs.LG·October 15, 2024

JurEE not Judges: safeguarding llm interactions with small, specialised Encoder Ensembles

Dom Nasrabadi

PDF

Open Access

TL;DR

JurEE is an ensemble of specialized encoder-only transformer models that provides probabilistic risk assessments for AI-User interactions, outperforming existing methods in accuracy, speed, and cost-efficiency for content moderation tasks.

Contribution

This paper introduces JurEE, a novel ensemble of encoder-only transformers that offers robust, interpretable, and efficient risk estimation across diverse safety scenarios in LLM-based systems.

Findings

01

JurEE significantly outperforms baseline models in accuracy and speed.

02

JurEE demonstrates superior cost-efficiency for large-scale moderation.

03

The modular design allows customizable risk thresholds for various applications.

Abstract

We introduce JurEE, an ensemble of efficient, encoder-only transformer models designed to strengthen safeguards in AI-User interactions within LLM-based systems. Unlike existing LLM-as-Judge methods, which often struggle with generalization across risk taxonomies and only provide textual outputs, JurEE offers probabilistic risk estimates across a wide range of prevalent risks. Our approach leverages diverse data sources and employs progressive synthetic data generation techniques, including LLM-assisted augmentation, to enhance model robustness and performance. We create an in-house benchmark comprising of other reputable benchmarks such as the OpenAI Moderation Dataset and ToxicChat, where we find JurEE significantly outperforms baseline models, demonstrating superior accuracy, speed, and cost-efficiency. This makes it particularly suitable for applications requiring stringent content…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law

MethodsSparse Evolutionary Training