A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Gonzalo Ariel Meyoyan; Luciano Del Corro

arXiv:2601.13288·cs.CL·April 28, 2026

A BERTology View of LLM Orchestrations: Token- and Layer-Selective Probes for Efficient Single-Pass Classification

Gonzalo Ariel Meyoyan, Luciano Del Corro

PDF

TL;DR

This paper introduces a method to reuse computation in large language models for classification tasks by training lightweight probes on hidden states, improving efficiency and reducing latency.

Contribution

It proposes a novel representation selection framework with a two-stage aggregator for token- and layer-specific probing, enabling efficient classification within the same forward pass.

Findings

01

Probes outperform logit-only reuse methods like MULI.

02

Probes are competitive with larger task-specific models.

03

Method generalizes across different architectures and model sizes.

Abstract

Production LLM systems often rely on separate models for safety and other classification-heavy steps, increasing latency, VRAM footprint, and operational complexity. We instead reuse computation already paid for by the serving LLM: we train lightweight probes on its hidden states and predict labels in the same forward pass used for generation. We frame classification as representation selection over the full token-layer hidden-state tensor, rather than committing to a fixed token or fixed layer (e.g., first-token logits or final-layer pooling). To implement this, we introduce a two-stage aggregator that (i) summarizes tokens within each layer and (ii) aggregates across layer summaries to form a single representation for classification. We instantiate this template with direct pooling, a 100K-parameter scoring-attention gate, and a downcast multi-head self-attention (MHA) probe with up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.