Multilingual Multimodal Software Developer for Code Generation

Linzheng Chai; Jian Yang; Shukai Liu; Wei Zhang; Liran Wang; Ke Jin; Tao Sun; Congnan Liu; Chenchen Zhang; Hualei Zhu; Jiaheng Liu; Xianjie Wu; Ge Zhang; Tianyu Liu; Zhoujun Li

arXiv:2507.08719·cs.CL·July 14, 2025

Multilingual Multimodal Software Developer for Code Generation

Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Liran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, Jiaheng Liu, Xianjie Wu, Ge Zhang, Tianyu Liu, Zhoujun Li

PDF

1 Models

TL;DR

This paper presents MM-Coder, a multilingual multimodal model that integrates visual design diagrams with textual instructions to improve code generation, supported by a new dataset and benchmark addressing multimodal challenges.

Contribution

Introduction of MM-Coder, a multimodal code generation model that combines visual and textual inputs, along with MMc-Instruct dataset and MMEval benchmark for evaluation.

Findings

01

MM-Coder improves code accuracy with visual inputs

02

MMEval reveals challenges in visual information capture

03

Multimodal instructions enhance architectural alignment

Abstract

The rapid advancement of Large Language Models (LLMs) has significantly improved code generation, yet most models remain text-only, neglecting crucial visual aids like diagrams and flowcharts used in real-world software development. To bridge this gap, we introduce MM-Coder, a Multilingual Multimodal software developer. MM-Coder integrates visual design inputs-Unified Modeling Language (UML) diagrams and flowcharts (termed Visual Workflow)-with textual instructions to enhance code generation accuracy and architectural alignment. To enable this, we developed MMc-Instruct, a diverse multimodal instruction-tuning dataset including visual-workflow-based code generation, allowing MM-Coder to synthesize textual and graphical information like human developers, distinct from prior work on narrow tasks. Furthermore, we introduce MMEval, a new benchmark for evaluating multimodal code generation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Multilingual-Multimodal-NLP/MM-Coder-7B
model· 13 dl· ♡ 1
13 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.