TL;DR
This paper introduces a novel theoretical framework and practical neuron design enabling high-performance ANN-to-SNN conversion with only a single timestep, significantly reducing inference latency and computational cost.
Contribution
The authors propose the Scale-and-Fire Neuron and the Temporal-to-Spatial Equivalence Theory, enabling effective single-timestep conversion of ANNs to SNNs with state-of-the-art results.
Findings
Achieved 88.8% top-1 accuracy on ImageNet-1K at T=1
Demonstrated state-of-the-art performance in image classification, detection, and segmentation
Reduced inference latency and computational overhead in SNNs
Abstract
Spiking Neural Networks (SNNs) are gaining attention as energy-efficient alternatives to Artificial Neural Networks (ANNs), especially in resource-constrained settings. While ANN-to-SNN conversion (ANN2SNN) achieves high accuracy without end-to-end SNN training, existing methods rely on large time steps, leading to high inference latency and computational cost. In this paper, we propose a theoretical and practical framework for single-timestep ANN2SNN. We establish the Temporal-to-Spatial Equivalence Theory, proving that multi-timestep integrate-and-fire (IF) neurons can be equivalently replaced by single-timestep multi-threshold neurons (MTN). Based on this theory, we introduce the Scale-and-Fire Neuron (SFN), which enables effective single-timestep () spiking through adaptive scaling and firing. Furthermore, we develop the SFN-based Spiking Transformer (SFormer), a specialized…
Peer Reviews
Decision·Submitted to ICLR 2026
The writing of this paper is fluent, with a well-structured organization. All the theories claimed in the paper have been properly illustrated and elaborated.
1. **Question on the correctness of Theorem 1**: Theorem 1 requires that "the input is bounded by θ", which constitutes a rather strong constraint. How to ensure this constraint can be satisfied? This is because the input is correlated with both input activations and weights. 2. **Clarification on the definition of variables and equivalence of outputs**: What is the meaning of \( o_M \) in Equation (27)? Is it consistent with the definition of \( o(t) \) in Equation (6)? It is noted that \(
1. The authors attempt to provide a rigorous theoretical underpinning for their single-time step approach through the "Temporal-to-Spatial Equivalence Theory". Grounding the methodology in a formal equivalence (even under ideal conditions) is a commendable effort that adds depth and clarity to the proposed conversion framework. 2. The proposed Scale-and-Fire Neuron (SFN) is a well-motivated design. It moves beyond a naive multi-threshold implementation by incorporating a scaling factor (λ) and
1. In Table 2, the performance comparison between different architectures is obviously unfair, the author should compare with other ANN2SNN methods using the same model architecture 2. Since the model uses multi thresholds and the time step is 1 only, the model is more similar to an activation quantized only model. Therefore, the comparison with some typical quantization methods like [1], is also necessary. 3. The citations of all existing other methods on all performance comparison tables are
The proposed SFN enables converted SNNs to achieve good performance with only 1 timestep.
1. The proposed Scale-and-Fire Neuron (SFN) largely mirrors the non-uniform activation quantization with calibration used in quanted ANNs. The so-called "Temporal-to-Spatial Equivalence Theory" is quite obvious and superficial. The resulting SFN actually transmits floating-point values and relies heavily on intricate searches for an appropriate scaling factor, which is unlikely to be a fixed constant that is completely task-independent and model-independent (as also acknowledged by the authors
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
