NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation
Hossain Shaikh Saadi, Faria Alam, Mario Sanz-Guerrero, Minh Duc Bui, Manuel Mager, Katharina von der Wense

TL;DR
This paper introduces a multi-agent system for translating Bangla instructions into Python code, achieving high accuracy by iterative debugging and testing, and winning the BLP-2025 shared task.
Contribution
It presents a novel multi-agent pipeline that iteratively refines code solutions based on test failures, improving code generation from Bangla instructions.
Findings
Achieved a Pass@1 score of 95.4 in the shared task.
First place in the BLP-2025 code generation challenge.
Code is publicly available for further research.
Abstract
This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation agent produces an initial solution from the input instruction. The candidate program is then executed against the provided unit tests (pytest-style, assert-based). Only the failing cases are forwarded to a debugger agent, which reruns the tests, extracts error traces, and, conditioning on the error messages, the current program, and the relevant test cases, generates a revised solution. Using this approach, our submission achieved first place in the shared task with a score of 95.4. We also make our code public.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Software Testing and Debugging Techniques
