How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

Hamidreza Saghir

arXiv:2605.00269·cs.CL·May 4, 2026

How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework

Hamidreza Saghir

PDF

TL;DR

This paper introduces a two-pathway framework for understanding how language models detect out-of-distribution inputs, distinguishing between content-based and processing-based signals, and reveals the limitations of existing confidence scores.

Contribution

It proposes a novel two-pathway framework separating embedding content from processing trajectory, improving OOD detection and deconfounding length-related confounds in LLMs.

Findings

01

Embedding methods excel on vocabulary-distinctive OOD tasks.

02

Trajectory features effectively detect covert-intent inputs.

03

Attention circuits are more engaged in adversarial tasks.

Abstract

Recent white-box OOD detection methods for LLMs -- including CED, RAUQ, and WildGuard confidence scores -- appear effective, but we show they are structurally confounded by sequence length (|r| >= 0.61) and collapse to near-chance under length-matched evaluation. Even raw attention entropy (mean H(alpha) across heads and layers), a natural baseline we include for completeness, shows the same confound. The confound stems from attention's Theta(log T) dependence on input length. To identify genuine OOD signals after deconfounding, we propose a two-pathway framework: embeddings capture what text is about (effective for topic shifts), while the processing trajectory -- hidden-state evolution across layers -- captures how the model processes input. The relative power of each pathway varies along a vocabulary-transparency spectrum: embedding methods excel on vocabulary-distinctive OOD, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.