TL;DR
CodeEvo presents an interaction-driven, iterative framework using two LLM agents to synthesize high-quality, instruction-code pairs, significantly improving code generation performance over existing methods.
Contribution
It introduces a novel hybrid, iterative feedback mechanism involving two LLM agents for more rigorous and effective code-centric data synthesis.
Findings
Models trained on CodeEvo data outperform baselines on code benchmarks.
The hybrid feedback mechanism improves data quality and synthesis efficiency.
Extensive analysis provides insights into effective code data generation.
Abstract
Acquiring high-quality instruction-code pairs is essential for training Large Language Models (LLMs) for code generation. Manually curated data is expensive and inherently limited in scale, motivating the development of code-centric synthesis methods. Yet, current approaches either focus on augmenting existing code or rely on predefined heuristics, both lacking rigorous data validation, which results in synthetic data that is ungrounded, repetitive, or overly simplistic. Inspired by collaborative programming practices, we propose CodeEvo, a framework that synthesizes code data through iterative interactions between two LLM agents: a Coder, which generates candidate code and test cases based on given instructions, and a Reviewer, which guides the synthesis process by producing new instructions and feedback. We further introduce a hybrid feedback mechanism that combines compiler…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗internlm/JanusCoder-8Bmodel· 43 dl· ♡ 1343 dl♡ 13
- 🤗internlm/JanusCoder-14Bmodel· 22 dl· ♡ 3422 dl♡ 34
- 🤗internlm/JanusCoderV-7Bmodel· 62 dl· ♡ 1462 dl♡ 14
- 🤗internlm/JanusCoderV-8Bmodel· 70 dl· ♡ 1370 dl♡ 13
- 🤗cyankiwi/JanusCoder-14B-AWQ-4bitmodel· 10 dl10 dl
- 🤗cyankiwi/JanusCoder-14B-AWQ-8bitmodel· 1 dl1 dl
- 🤗cyankiwi/JanusCoder-8B-AWQ-8bitmodel· 1 dl1 dl
- 🤗cyankiwi/JanusCoder-8B-AWQ-4bitmodel· 9 dl9 dl
- 🤗unsloth/JanusCoder-8B-GGUFmodel· 357 dl· ♡ 3357 dl♡ 3
- 🤗unsloth/JanusCoder-8Bmodel· 18 dl18 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
