FAR: Function-preserving Attention Replacement for IMC-friendly Inference

Yuxin Ren; Maxwell D Collins; Miao Hu; Huanrui Yang

arXiv:2505.21535·cs.CV·May 18, 2026

FAR: Function-preserving Attention Replacement for IMC-friendly Inference

Yuxin Ren, Maxwell D Collins, Miao Hu, Huanrui Yang

PDF

TL;DR

FAR introduces a function-preserving attention replacement for transformers, enabling efficient IMC-compatible inference with minimal accuracy loss and reduced latency.

Contribution

It proposes a novel attention replacement framework that retains model performance while optimizing for in-memory computing hardware.

Findings

01

FAR maintains comparable accuracy to original models on ImageNet.

02

FAR reduces model parameters and latency significantly.

03

Structured pruning enables resource adaptation without accuracy loss.

Abstract

While transformers dominate modern vision and language models, their attention mechanism remains poorly suited for in-memory computing (IMC) devices due to intensive activation-to-activation multiplications and non-local memory access, leading to substantial latency and bandwidth overhead on ReRAM-based accelerators. To address this mismatch, we propose FAR, a Function-preserving Attention Replacement framework that substitutes all attention in pretrained DeiTs with sequential modules inherently compatible with IMC dataflows. Specifically, FAR replaces self-attention with a multi-head bidirectional LSTM architecture via block-wise distillation to retain functional equivalence while enabling linear-time computation and localized weight reuse. We further incorporate structured pruning on FAR models, enabling flexible adaptation to resource-constrained IMC arrays while maintaining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Advanced Memory and Neural Computing

MethodsLinear Layer · Softmax · Multi-Head Attention · Dropout · Attention Is All You Need · Residual Connection · Layer Normalization · Dense Connections · Vision Transformer · Pruning