Revealing the Inherent Instructability of Pre-Trained Language Models
Seokhyun An, Minji Kim, Hyounghun Kim

TL;DR
This paper shows that pre-trained language models inherently understand instructions and safety, even without explicit instruction tuning, by focusing on response distribution during training.
Contribution
It introduces Response Tuning, a novel method that trains models solely on responses, revealing their innate instruction-following and safety recognition abilities.
Findings
Response Tuning models respond effectively to instructions.
Models recognize and reject unsafe queries after response-only training.
Inherent instruction understanding extends to in-context learning.
Abstract
Instruction tuning -- supervised fine-tuning using instruction-response pairs -- is a key step in making pre-trained large language models (LLMs) instructable. Meanwhile, LLMs perform multitask learning during their pre-training, acquiring extensive knowledge and capabilities. We hypothesize that the pre-training stage can enable them to develop the ability to comprehend and address instructions. To verify this, we propose Response Tuning (RT), which removes the instruction and its corresponding mapping to the response from instruction tuning. Instead, it focuses solely on establishing a response distribution. Our experiments demonstrate that RT models, trained only on responses, can effectively respond to a wide range of instructions akin to their instruction-tuned counterparts. In addition, we observe that the models can recognize and reject unsafe queries after learning a safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
