Applying RLAIF for Code Generation with API-usage in Lightweight LLMs

Sujan Dutta; Sayantan Mahinder; Raviteja Anantha; Bortik Bandyopadhyay

arXiv:2406.20060·cs.CL·July 1, 2024

Applying RLAIF for Code Generation with API-usage in Lightweight LLMs

Sujan Dutta, Sayantan Mahinder, Raviteja Anantha, Bortik Bandyopadhyay

PDF

Open Access 1 Video

TL;DR

This paper presents a novel RLAIF framework that improves code generation in lightweight LLMs by leveraging AI feedback from larger models, significantly increasing code executability and surpassing larger baselines.

Contribution

It introduces an RLAIF approach for enhancing API-usage code generation in small LLMs, utilizing feedback from larger models to train reward models for better alignment.

Findings

01

4.5% increase in code executability rate

02

Smaller LLM (780M) outperforms larger baseline (7B)

03

Effective use of AI feedback from GPT-3.5

Abstract

Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains, including mitigating harm in LLM outputs, enhancing text summarization, and mathematical reasoning. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (<1B parameters) LLMs. We specifically focus on code generation tasks that require writing appropriate API calls, which is challenging due to the well-known issue of hallucination in LLMs. Our framework extracts AI feedback from a larger LLM (e.g., GPT-3.5) through a specialized prompting strategy and uses this data to train a reward model towards better alignment from smaller LLMs. We run our experiments on the Gorilla dataset and meticulously assess the quality of the model-generated code across various metrics, including AST, ROUGE, and Code-BLEU, and develop a pipeline to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Applying RLAIF for Code Generation with API-usage in Lightweight LLMs· underline

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Digital Rights Management and Security · Service-Oriented Architecture and Web Services

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Softmax · Layer Normalization · Linear Warmup With Cosine Annealing