KForge: Program Synthesis for Diverse AI Hardware Accelerators
Taras Sereda, Tom St. John, Burak Bartan, Natalie Serrino, Sachin Katti, Zain Asgar

TL;DR
KForge is a versatile framework that uses collaborative language model agents to generate and optimize programs for various AI hardware accelerators, enabling cross-platform program synthesis with minimal platform-specific input.
Contribution
Introducing an iterative, agent-based system that leverages profiling data and cross-platform knowledge transfer for effective program synthesis across diverse hardware accelerators.
Findings
Effective program synthesis for NVIDIA CUDA and Apple Metal.
Cross-platform knowledge transfer significantly improves generation quality.
Collaborative agents interpret diverse profiling data to guide optimization.
Abstract
GPU kernels are critical for ML performance but difficult to optimize across diverse accelerators. We present KForge, a platform-agnostic framework built on two collaborative LLM-based agents: a generation agent that produces and iteratively refines programs through compilation and correctness feedback, and a performance analysis agent that interprets profiling data to guide optimization. This agent-based architecture requires only a single-shot example to target new platforms. We make three key contributions: (1) introducing an iterative refinement system where the generation agent and performance analysis agent collaborate through functional and optimization passes, interpreting diverse profiling data (from programmatic APIs to GUI-based tools) to generate actionable recommendations that guide program synthesis for arbitrary accelerators; (2) demonstrating that the generation agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
