STAF: Sinusoidal Trainable Activation Functions for Implicit Neural Representation
Alireza Morsali, MohammadJavad Vaez, Mohammadhossein Soltani, Amirhossein Kazerouni, Babak Taati, Morteza Mohammad-Noori

TL;DR
STAF introduces trainable sinusoidal activation functions that adaptively learn spectral components, significantly enhancing the accuracy and efficiency of implicit neural representations across various signal and inverse problem tasks.
Contribution
The paper proposes STAF, a novel trainable sinusoidal activation function that improves spectral learning and representation in neural networks, addressing limitations of spectral bias in INRs.
Findings
STAF outperforms state-of-the-art methods in signal reconstruction tasks.
It accelerates convergence and enhances expressivity in neural representations.
Effective across diverse applications like shape, image, audio, and NeRF.
Abstract
Implicit Neural Representations (INRs) have emerged as a powerful framework for modeling continuous signals. The spectral bias of ReLU-based networks is a well-established limitation, restricting their ability to capture fine-grained details in target signals. While previous works have attempted to mitigate this issue through frequency-based encodings or architectural modifications, these approaches often introduce additional complexity and do not fully address the underlying challenge of learning high-frequency components efficiently. We introduce Sinusoidal Trainable Activation Functions (STAF), designed to directly tackle this limitation by enabling networks to adaptively learn and represent complex signals with higher precision and efficiency. STAF inherently modulates its frequency components, allowing for self-adaptive spectral learning. This capability significantly improves…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1). The paper is easy to understand and well written. 2) The proposed approach goes beyond the conventional SIREN architecture through making the activation parameters learnable such that those adapt to the given image, audio or shape. 3). Instead of using one sinusoid at a specific frequency, STAF proposes to use a linear combination of sinusoids (Fourier approach) 4). The mathematical rigorousness of the paper is notable, and STAF aims to prevent issues related of vanishing or exploding gra
I believe the core of the paper is good, but I the current version of the paper lacks extensive experimental results. Please see the following parts. 1). The experimental results are only shown for representation tasks (image, occupancy, and audio). If the proposed approach can be extended to inverse tasks that will demonstrate your method's generalization abilities. For instance how STAF works for inverse vision tasks like inpainting or NeRFs. 2). The image representation results have been o
1. The choice of activation functions for each layer's neurons significantly impacts the performance of INRs. SIREN demonstrated that periodic activation functions are effective in INRs by utilizing fixed sine functions. In contrast, STAF employs periodic activation functions that are dynamically trained with the MLP's weights and biases during the training process, enhancing the network's ability to better capture and reconstruct high-frequency signals. This design is similar to Fourier series,
Models utilizing STAF are more efficient than SIREN, as they can achieve higher performance with fewer parameters. This is because the activation function leverages frequency-related parameters to combine multiple sine functions, resulting in greater expressive power. However, as the frequency-related parameter (nf in the code) increases, the computational cost rises rapidly. This can slow down training and require more computational resources and memory. Therefore, when using STAF, it is import
- Strong PSNR increase - Strong method with minimal parameter overhead. Authors seem to have studied the problem well to come up with this method - Personal suggestion, will *not* affect my end vote, but i would recommend moving the audio representation into the main paper and the proof of initialization scheme to the end, as the experiments speak for themselves. I find the math provides little to no value, but once again personal preference.
Baselines feel weak: - For images there’s SPDER (https://arxiv.org/pdf/2306.15242) which claims to be SOTA for image representation and is very similar, should be compared - 3D shape representation should have much better baselines than SIREN. Quick online search: DeepSDF, Occupancy Networks, IM-NET. I suggest the authors try to compare to at least 1 of these published in the last 2 years. - It is not meaningful to keep mentioning that you improve spectral bias over ReLU Networks–I think those i
+ Very well-written paper. + Presents the math in a very precise and clear way. + Found an important issue in the original SIREN paper which made some wrong assumptions. + Efficiency of the proposed overall approach for image representation using STAF.
- The main limitation of this work is that experiments are very weak. One can not draw a clear conclusion based on what is being presented in the paper. For example, results are presented only on a handful of images (2-3 images). It is not clear if the results also hold of different types of images. Most examples presented in the paper are grayscale kind of images. Detailed evaluation on many variety of images with proper comparison with other methods will make the paper stronger. Otherwis
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
