TL;DR
This paper develops a regime theory for selecting controller classes in language and vision models, balancing complexity and data limitations to optimize decision-making strategies.
Contribution
It introduces a nested lattice of controller classes and a regime theory that guides optimal class choice based on data and task characteristics.
Findings
The theory predicts the best controller class across multiple benchmarks.
Empirical results match the predicted class choices in various tasks.
The prior-gated controller excels in OCR-based visual question answering.
Abstract
Deployed language and vision-language models must decide, on each input, whether to answer directly, retrieve evidence, defer to a stronger model, or abstain. Contrary to the common monotonicity intuition, greater per-input expressivity is not uniformly beneficial in finite samples: under identical strict cross-validation, different benchmarks prefer different controller classes. This reflects a finite-sample limitation of instance-level uncertainty signals, which can be exhausted at a distribution-dependent scale. We organize controllers into a nested lattice of four classes: fixed actions, partition routers, instance-level controllers, and prior-gated controllers, ordered by complexity. We prove a regime theory that turns three data-estimable bottlenecks into a class choice: how much improvement is possible beyond the best fixed action, whether there are enough samples for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
