Reducing the Scope of Language Models
David Yunis, Siyu Huo, Chulaka Gunasekara, Danish Contractor

TL;DR
This paper evaluates various methods to restrict large language models to respond only to relevant queries, demonstrating that effective scoping is achievable through fine-tuning and Circuit Breakers, with practical guidance for deployment.
Contribution
It provides a comprehensive empirical comparison of scoping techniques for LLMs, including prompting, fine-tuning, preference learning, and Circuit Breakers, across diverse tasks and models.
Findings
Fine-tuning excels with diverse irrelevant queries.
Circuit Breakers perform well with low diversity.
Layering methods enhances scoping effectiveness.
Abstract
Large language models (LLMs) are deployed in a wide variety of user-facing applications. Typically, these deployments have some specific purpose, like answering questions grounded on documentation or acting as coding assistants, but they require general language understanding. In such deployments, LLMs should respond only to queries that align with the intended purpose and reject all other requests, such as generating poetry or answering questions about physics, a task we refer to as `scoping'. We conduct a comprehensive empirical evaluation of various methods, ranging from prompting, fine-tuning to preference learning and the recently proposed general alignment technique known as Circuit Breakers (CB). Across three families of language models and a broad variety of tasks, we show that it is possible to scope language models. We examine scoping for multiple topics, and fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPower Systems and Technologies
MethodsShrink and Fine-Tune
