MojoBench: Language Modeling and Benchmarks for Mojo

Nishat Raihan; Joanna C. S. Santos; Marcos Zampieri

arXiv:2410.17736·cs.CL·October 24, 2024·2 cites

MojoBench: Language Modeling and Benchmarks for Mojo

Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri

PDF

Open Access 3 Models 4 Datasets 1 Video

TL;DR

MojoBench introduces a new framework and benchmark dataset for evaluating and improving language models' ability to generate code in the Mojo programming language, highlighting significant performance gains and insights into model adaptability.

Contribution

This work is the first to develop a dedicated benchmark and pretrained model for Mojo code generation, expanding LLM evaluation to emerging programming languages.

Findings

01

Mojo-Coder outperforms GPT-4o and Claude-3.5-Sonnet by 30-35%

02

MojoBench provides a comprehensive evaluation framework for Mojo code generation

03

Insights into LLM behavior with underrepresented programming languages

Abstract

The recently introduced Mojo programming language (PL) by Modular, has received significant attention in the scientific community due to its claimed significant speed boost over Python. Despite advancements in code Large Language Models (LLMs) across various PLs, Mojo remains unexplored in this context. To address this gap, we introduce MojoBench, the first framework for Mojo code generation. MojoBench includes HumanEval-Mojo, a benchmark dataset designed for evaluating code LLMs on Mojo, and Mojo-Coder, the first LLM pretrained and finetuned for Mojo code generation, which supports instructions in 5 natural languages (NLs). Our results show that Mojo-Coder achieves a 30-35% performance improvement over leading models like GPT-4o and Claude-3.5-Sonnet. Furthermore, we provide insights into LLM behavior with underrepresented and unseen PLs, offering potential strategies for enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

MojoBench: Language Modeling and Benchmarks for Mojo· underline

Taxonomy

TopicsMultimedia Communication and Technology · Recommender Systems and Techniques · Video Analysis and Summarization

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings