Revealing the Inherent Instructability of Pre-Trained Language Models

Seokhyun An; Minji Kim; Hyounghun Kim

arXiv:2410.02465·cs.CL·September 16, 2025

Revealing the Inherent Instructability of Pre-Trained Language Models

Seokhyun An, Minji Kim, Hyounghun Kim

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper shows that pre-trained language models inherently understand instructions and safety, even without explicit instruction tuning, by focusing on response distribution during training.

Contribution

It introduces Response Tuning, a novel method that trains models solely on responses, revealing their innate instruction-following and safety recognition abilities.

Findings

01

Response Tuning models respond effectively to instructions.

02

Models recognize and reject unsafe queries after response-only training.

03

Inherent instruction understanding extends to in-context learning.

Abstract

Instruction tuning -- supervised fine-tuning using instruction-response pairs -- is a key step in making pre-trained large language models (LLMs) instructable. Meanwhile, LLMs perform multitask learning during their pre-training, acquiring extensive knowledge and capabilities. We hypothesize that the pre-training stage can enable them to develop the ability to comprehend and address instructions. To verify this, we propose Response Tuning (RT), which removes the instruction and its corresponding mapping to the response from instruction tuning. Instead, it focuses solely on establishing a response distribution. Our experiments demonstrate that RT models, trained only on responses, can effectively respond to a wide range of instructions akin to their instruction-tuned counterparts. In addition, we observe that the models can recognize and reject unsafe queries after learning a safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

seokhyunan/response-tuning
noneOfficial

Videos

Revealing the Inherent Instructability of Pre-Trained Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques