GALA: Multimodal Graph Alignment for Bug Localization in Automated Program Repair
Zhuoyao Liu, Zhengran Zeng, Shu-Dong Huang, Yang Liu, Shikun Zhang, Wei Ye

TL;DR
GALA introduces a structured multimodal framework for bug localization in automated program repair, effectively aligning visual UI elements with code components to improve accuracy.
Contribution
It proposes a novel hierarchical structural alignment approach that explicitly models visual and code relationships, surpassing prior methods that rely on plain text conversion.
Findings
GALA achieves state-of-the-art results on the SWE-bench Multimodal benchmark.
The framework effectively captures spatial relationships in UI images for precise bug localization.
Hierarchical alignment significantly improves the mapping between visual observations and code.
Abstract
Large Language Model (LLM)-based Automated Program Repair (APR) has shown strong potential on textual benchmarks, yet struggles in multimodal scenarios where bugs are reported with GUI screenshots. Existing methods typically convert images into plain text, which discards critical spatial relationships and causes a severe disconnect between visual observations and code components, leading localization to degrade into imprecise keyword matching. To bridge this gap, we propose GALA (Graph Alignment for Localization in APR), a framework that shifts multimodal APR from implicit semantic guessing to explicit structural reasoning. GALA operates in four stages: it first constructs an Image UI Graph to capture visual elements and their structural relationships; then performs file-level alignment by cross-referencing this UI graph with repository-level structures (e.g., file references) to locate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
