TL;DR
This paper compares fine-tuned encoders and prompt-based LLMs for response clarity detection in political interviews, showing LLM ensembles outperform encoders, especially on minority classes.
Contribution
It demonstrates that prompt-based LLMs without task-specific tuning outperform fine-tuned encoders in response clarity detection tasks.
Findings
LLM ensembles achieve 80 macro-F1 on Task 1 and 59 on Task 2.
Prompt-based LLMs outperform fine-tuned encoders, especially on minority classes.
Enriching input with full interviewer turn improves LLM performance.
Abstract
In this paper, we present our system for SemEval-2026 Task 6 (CLARITY) on response clarity and evasion detection in question-answer pairs from U.S. presidential interviews, comparing fine-tuned encoders with prompt-based LLMs. Our LLM ensemble achieves 80 macro-F1 on the 3-class Task 1 (9th/41) and 59 on the 9-class Task 2 (3rd/33). Across 8 transformer encoders optimized through a four-stage pipeline, partial encoder layer unfreezing outperforms full fine-tuning by a wide margin. Combining English and multilingual encoders further improves ensemble performance over either family alone, despite multilingual models being individually weaker. Prompt-based LLMs, without any task-specific parameter updates, outperform fine-tuned encoders, particularly on minority classes; among open-weight LLMs, parameter count does not predict performance. Enriched input, concatenating the full interviewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
