Lemur: Harmonizing Natural Language and Code for Language Agents

Yiheng Xu; Hongjin Su; Chen Xing; Boyu Mi; Qian Liu; Weijia Shi,; Binyuan Hui; Fan Zhou; Yitao Liu; Tianbao Xie; Zhoujun Cheng; Siheng Zhao,; Lingpeng Kong; Bailin Wang; Caiming Xiong; Tao Yu

arXiv:2310.06830·cs.CL·August 27, 2024·6 cites

Lemur: Harmonizing Natural Language and Code for Language Agents

Yiheng Xu, Hongjin Su, Chen Xing, Boyu Mi, Qian Liu, Weijia Shi,, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao,, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu

PDF

Open Access 1 Repo 2 Models 3 Reviews

TL;DR

Lemur and Lemur-Chat are open-source language models that effectively combine natural language understanding and coding skills, enabling versatile language agents capable of reasoning, planning, and environment interaction.

Contribution

The paper introduces Lemur and Lemur-Chat, novel models that balance language and coding capabilities, outperforming existing open-source models on diverse benchmarks.

Findings

01

Achieve state-of-the-art performance on text and code benchmarks.

02

Demonstrate superior ability in agent tasks involving human interaction and tool use.

03

Narrow the gap with proprietary models in agent proficiency.

Abstract

We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls for a harmonious blend of language and coding capabilities in the models. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks among open-source models.…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

**Originality:** This paper proposes a novel way of training LLMs with code + text data to design language agents. **Quality:** There are detailed studies included in the paper about how training LLMs can be beneficial to solve both the language and agent tasks. **Clarity:** The paper is well-written and easy to follow.

Weaknesses

**Ambiguous Motivation:** I am fully not convinced with the sentence "for the construction of language agents, it is imperative for language models to possess harmonized capabilities in both natural language and programming languages." Its unclear how programming languages correlate with language understanding. In fact in the context of linguistics (morphology, syntax and semantics), programming languages might not satisfy any of them. I believe the authors should provide more context for it. Al

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

This article validates the effectiveness of the Lemur model on a large number of benchmarks and verifies the importance of balanced language and coding capabilities for language agent scenarios.

Weaknesses

1) The technical contribution of this article is quite limited, it merely continues training the LLAMA model on a mixture of text and code data and instruction tuning on four datasets. 2) When comparing performance on the code benchmark, the authors use a large 70B model, but the code-specific models they compare with are mostly 15-30B in size, which makes the comparison somewhat unequal.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1.It improves the coding ability while maintaining the reasoning ability of Llama-2. 2.The Lemur is pre-trained and fine-tuned using a rich dataset that includes text and code, ensuring a balance of performance across a variety of text and coding benchmarks. 3.The model showcases proficiency in agent tasks, encompassing human communication, tool usage, and interaction across observable environments.

Weaknesses

1.It seems that pre-training takes the responsibility to gain the coding ability, and the supervised fine-tuning takes the responsibility to gain the natural language ability, while it is vague how the proposed model balance these two abilities. 2.As shown in Tables 4, 5, and 7, the performance of the proposed model Lemur-70B-Chat falls short when compared to GPT-4 and this discrepancy in performance lacks an explanatory or discussion. 3.Table 3 lists three baseline models—StarCoder-15B, StarCo

Code & Models

Repositories

openlemur/lemur
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications