Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Sujan Dutta, Sayantan Mahinder, Raviteja Anantha, Bortik Bandyopadhyay

TL;DR
This paper presents a novel RLAIF framework that improves code generation in lightweight LLMs by leveraging AI feedback from larger models, significantly increasing code executability and surpassing larger baselines.
Contribution
It introduces an RLAIF approach for enhancing API-usage code generation in small LLMs, utilizing feedback from larger models to train reward models for better alignment.
Findings
4.5% increase in code executability rate
Smaller LLM (780M) outperforms larger baseline (7B)
Effective use of AI feedback from GPT-3.5
Abstract
Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains, including mitigating harm in LLM outputs, enhancing text summarization, and mathematical reasoning. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (<1B parameters) LLMs. We specifically focus on code generation tasks that require writing appropriate API calls, which is challenging due to the well-known issue of hallucination in LLMs. Our framework extracts AI feedback from a larger LLM (e.g., GPT-3.5) through a specialized prompting strategy and uses this data to train a reward model towards better alignment from smaller LLMs. We run our experiments on the Gorilla dataset and meticulously assess the quality of the model-generated code across various metrics, including AST, ROUGE, and Code-BLEU, and develop a pipeline to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Digital Rights Management and Security · Service-Oriented Architecture and Web Services
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Softmax · Layer Normalization · Linear Warmup With Cosine Annealing
