IntroLM: Introspective Language Models via Prefilling-Time Self-Evaluation
Hossein Hosseini Kasnavieh, Gholamreza Haffari, Chris Leckie, Adel N. Toosi

TL;DR
IntroLM enables large language models to self-assess output quality during prefilling, improving prediction accuracy and system efficiency without external evaluators.
Contribution
It introduces token conditional LoRA for introspective tokens, allowing LLMs to predict their own output quality during generation.
Findings
Achieves 90% ROC AUC on success prediction benchmarks.
Reduces latency by up to 33% and model usage by 50% in multi-model routing.
Outperforms external classifiers like DeBERTa in quality prediction.
Abstract
A major challenge for the operation of large language models (LLMs) is how to predict whether a specific LLM will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the prefilling phase without affecting generation using introspective tokens. By introducing token conditional LoRA that activates only for the introspective token, the model learns to predict the output quality for a given query while preserving the original backbone behavior and avoiding external evaluators. On question answering benchmarks, IntroLM applied to Qwen3 8B achieves a ROC AUC of 90 precent for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
