MMCode: Benchmarking Multimodal Large Language Models for Code   Generation with Visually Rich Programming Problems

Kaixin Li; Yuchen Tian; Qisheng Hu; Ziyang Luo; Zhiyong Huang; Jing Ma

arXiv:2404.09486·cs.CL·September 27, 2024·1 cites

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Kaixin Li, Yuchen Tian, Qisheng Hu, Ziyang Luo, Zhiyong Huang, Jing Ma

PDF

Open Access 3 Repos 1 Datasets

TL;DR

This paper introduces MMCode, a new multimodal dataset with visual programming problems to evaluate and benchmark large language models' ability to interpret visual information for code generation, revealing current models' limitations.

Contribution

The creation of MMCode, the first multimodal dataset for visually rich programming problems, and an evaluation of state-of-the-art models' performance on these tasks.

Findings

01

Current models struggle with visually rich programming problems.

02

MMCode exposes gaps in vision-code model capabilities.

03

Benchmark results highlight the need for improved multimodal reasoning.

Abstract

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

likaixin/MMCode
dataset· 176 dl
176 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Multimodal Machine Learning Applications