REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Aniruddha Roy; Pretam Ray; Abhilash Nandy; Somak Aditya; Pawan Goyal

arXiv:2505.06548·cs.CL·May 13, 2025

REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback

Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, Pawan Goyal

PDF

Open Access

TL;DR

This paper introduces REFINE-AF, a task-agnostic framework that uses self-generated instructions and reinforcement learning to improve open-source LLMs, reducing costs and human effort while enhancing task performance.

Contribution

The paper presents a semi-automated, RL-enhanced instruction generation framework for open-source LLMs, outperforming prior methods that relied on expensive API-only models.

Findings

01

RL-based frameworks improve performance in 63-66% of tasks

02

Open-source LLMs can effectively generate instructions with reduced human effort

03

Cost-effective alternative to API-dependent instruction tuning

Abstract

Instruction-based Large Language Models (LLMs) have proven effective in numerous few-shot or zero-shot Natural Language Processing (NLP) tasks. However, creating human-annotated instruction data is time-consuming, expensive, and often limited in quantity and task diversity. Previous research endeavors have attempted to address this challenge by proposing frameworks capable of generating instructions in a semi-automated and task-agnostic manner directly from the model itself. Many of these efforts have relied on large API-only parameter-based models such as GPT-3.5 (175B), which are expensive, and subject to limits on a number of queries. This paper explores the performance of three open-source small LLMs such as LLaMA 2-7B, LLama 2-13B, and Mistral 7B, using a semi-automated framework, thereby reducing human intervention, effort, and cost required to generate an instruction dataset for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Attention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Linear Layer · Weight Decay · Adam