BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

Ilias Aarab

arXiv:2603.11991·cs.CL·March 13, 2026

BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs

Ilias Aarab

PDF

Open Access

TL;DR

This paper introduces BTZSC, a comprehensive benchmark for zero-shot text classification across diverse models and datasets, systematically comparing NLI cross-encoders, embedding models, rerankers, and LLMs to evaluate their performance and trade-offs.

Contribution

The paper presents BTZSC, a new benchmark with 22 datasets, and provides a systematic comparison of four major model families for zero-shot text classification.

Findings

01

Rerankers like Qwen3-Reranker-8B achieve state-of-the-art macro F1 of 0.72.

02

Embedding models such as GTE-large-en-v1.5 offer a good balance between accuracy and latency.

03

Instruction-tuned LLMs reach macro F1 up to 0.67, especially excelling in topic classification.

Abstract

Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI), recent advances in text-embedding models, rerankers, and instruction-tuned large language models (LLMs) have challenged the dominance of NLI-based architectures. Yet, systematically comparing these diverse approaches remains difficult. Existing evaluations, such as MTEB, often incorporate labeled examples through supervised probes or fine-tuning, leaving genuine zero-shot capabilities underexplored. To address this, we introduce BTZSC, a comprehensive benchmark of 22 public datasets spanning sentiment, topic, intent, and emotion classification, capturing diverse domains, class cardinalities, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Computational and Text Analysis Methods