CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning
Sonu Kumar, and Mukul Lokhande, and Santosh Kumar Vishvakarma, and Adam Teman

TL;DR
CARMEN is a resource-efficient, adaptive deep learning inference engine using CORDIC that dynamically balances accuracy and efficiency, achieving significant hardware savings and high performance on ASIC and FPGA platforms.
Contribution
It introduces a CORDIC-based multi-precision engine with runtime adaptability, enabling flexible precision and improved efficiency without hardware modifications.
Findings
Up to 33% reduction in computation cycles and 21% power savings in ASIC implementation.
Achieves 4.83 TOPS/mm2 compute density and 11.67 TOPS/W energy efficiency.
Demonstrates real-time object detection with low latency and power on FPGA.
Abstract
This paper presents CARMEN, a runtime-adaptive, CORDIC-accelerated multi-precision vector engine for resource-efficient deep learning inference. The key insight is that CORDIC iteration depth directly governs computational accuracy, enabling dynamic switching between approximate and accurate execution modes without hardware modification. The architecture integrates a low-resource iterative CORDIC-based MAC unit with a time-multiplexed multi-activation function block, supporting flexible 8/16-bit precision and high hardware utilization. ASIC implementation in 28 nm CMOS achieves up to 33% reduction in computation cycles and 21% power savings per MAC stage; a 256-PE configuration delivers 4.83 TOPS/mm2 compute density and 11.67 TOPS/W energy efficiency. FPGA deployment on PynqZ2 validates 154.6 ms latency at 0.43 W for real-time object detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
