Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Satyam Kumar; Saurabh Jha

arXiv:2604.16498·cs.AR·April 21, 2026

Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Satyam Kumar, Saurabh Jha

PDF

TL;DR

Forge-UGC is a hardware-agnostic compiler for transformer models that improves compilation speed, reduces runtime latency and energy, and supports modern transformer components on heterogeneous accelerators.

Contribution

It introduces a four-phase, transparent compilation pipeline with novel optimization passes, lowering overhead and enhancing performance for transformer deployment on NPUs.

Findings

01

Achieves 6.9 to 9.2x faster compilation than existing frameworks.

02

Reduces inference latency by 18.2 to 35.7%.

03

Lowers energy consumption per inference by 30.2 to 40.9%.

Abstract

We present Forge-UGC (FX Optimization and Register-Graph Engine for Universal Graph Compilation), a four-phase compiler for transformer deployment on heterogeneous accelerator hardware, validated on Intel AI Boost NPU. Existing frameworks such as OpenVINO and ONNX Runtime often use opaque compilation pipelines, limited pass-level visibility, and weak buffer management, which can lead to higher compilation cost and runtime overhead. Forge-UGC addresses this with a hardware-agnostic design that separates graph capture, optimization, intermediate representation lowering, and backend scheduling. Phase 1 captures graphs with torch.export at the ATen operator level, supporting modern transformer components such as rotary position embeddings, grouped-query attention, and SwiGLU without manual decomposition. Phase 2 applies six optimization passes: dead code elimination, common subexpression…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.