Semantic Reranking at Inference Time for Hard Examples in Rhetorical Role Labeling
Anas Belfathi, Nicolas Hernandez, Laura Monceaux, Warren Bonnard, Richard Dufour

TL;DR
This paper introduces RISE, a semantic reranking method at inference time that improves the accuracy of language models on hard examples in Rhetorical Role Labeling by leveraging label semantics without retraining.
Contribution
RISE is a novel inference-time framework that uses label semantics for reranking, significantly enhancing performance on difficult instances across multiple datasets and models.
Findings
Average +9.15 macro-F1 points improvement on hard examples.
Effectiveness demonstrated across eight datasets and seven language models.
Moderate agreement (Cohen's kappa = 0.40) between model and human difficulty annotations.
Abstract
Rhetorical Role Labeling (RRL) assigns a functional role to each sentence in a document and is widely used in legal, medical, and scientific domains. While language models (LMs) achieve strong average performance, they remain unreliable on hard examples, where prediction confidence is low. Existing approaches typically handle uncertainty implicitly and treat labels as discrete identifiers, overlooking the semantic information encoded in label names. We introduce RISE, an inference-time semantic reranking framework that leverages label semantics to refine predictions on hard instances. RISE automatically identifies low-confidence predictions and reranks model outputs using contrastively learned label representations, without retraining or modifying the underlying model. Experiments on eight domain-specific RRL datasets with seven LMs, including encoder-based and causal architectures,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
