TL;DR
HGQ-LUT is a novel LUT-aware training approach for DNNs that significantly accelerates training and enhances hardware efficiency, enabling practical deployment on FPGA-based systems.
Contribution
It introduces a new LAT method with accelerator-efficient layers, automated accuracy-resource trade-off exploration, and integration into open-source tools for real-world FPGA deployment.
Findings
Achieves over 100x faster training on GPUs.
Provides state-of-the-art hardware efficiency for LUT-based DNNs.
Enables automated design and verification of hybrid architectures.
Abstract
Lookup-table (LUT) based neural networks can deliver ultra-low latency and excellent hardware efficiency on FPGAs by mapping arithmetic operations directly onto the logic primitives. However, state-of-the-art LUT-aware training (LAT) approaches remain difficult to use in practice: they are often orders of magnitude slower to train than conventional networks, require non-trivial manual tuning for hardware efficiency, and lack an end-to-end workflow. This work presents HGQ-LUT, integrated in https://github.com/calad0i/HGQ2, a new LAT approach that achieves state-of-the-art hardware efficiency while accelerating training by over 100 times on modern GPUs. HGQ-LUT introduces LUT-Dense and LUT-Conv layers that are implemented with regular, accelerator-efficient tensor operations during training, which are then compiled into logic LUTs for hardware. By combining these layers with fine-grained,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
