CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

Jiahui Geng; Qing Li; Fengyu Cai; Fakhri Karray

arXiv:2604.15663·cs.SE·April 20, 2026

CodeMMR: Bridging Natural Language, Code, and Image for Unified Retrieval

Jiahui Geng, Qing Li, Fengyu Cai, Fakhri Karray

PDF

1 Repo

TL;DR

This paper introduces CodeMMR, a unified multimodal retrieval model that embeds natural language, code, and images into a shared space, improving code search and generation across visual and textual modalities.

Contribution

It presents the first comprehensive benchmark for multimodal code IR and proposes a model that outperforms baselines in cross-modal retrieval and enhances code generation fidelity.

Findings

01

CodeMMR outperforms baselines by 10 points on nDCG@10.

02

It generalizes well across multiple modalities and programming languages.

03

Integrating CodeMMR into RAG improves code generation and visual grounding.

Abstract

Code search, framed as information retrieval (IR), underpins modern software engineering and increasingly powers retrieval-augmented generation (RAG), improving code discovery, reuse, and the reliability of LLM-based coding. Yet existing code IR models remain largely text-centric and often overlook the visual and structural aspects inherent in programming artifacts such as web interfaces, data visualizations, SVGs, schematic diagrams, and UML. To bridge this gap, we introduce MMCoIR, the first comprehensive benchmark for evaluating multimodal code IR across five visual domains, eight programming languages, eleven libraries, and show the challenge of the task through extensive evaluation. Therefore, we then propose CodeMMR, a unified retrieval model that jointly embeds natural language, code, and images into a shared semantic space through instruction-based multimodal alignment. CodeMMR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.