Positional Embedding-Aware Activations

Kathan Shah; Chawin Sitawarin

arXiv:2306.15242·cs.CV·January 15, 2026·1 cites

Positional Embedding-Aware Activations

Kathan Shah, Chawin Sitawarin

PDF

Open Access

TL;DR

This paper introduces SPDER, a neural network architecture with a novel activation function that learns positional embeddings, significantly improving training speed and representation quality for images and audio without hyperparameter tuning.

Contribution

The paper proposes a simple MLP with a sinusoidal-damped activation function that naturally learns positional embeddings and overcomes spectral bias, achieving state-of-the-art results in image and audio representation.

Findings

01

Speeds up training by 10x

02

Achieves 1,500-50,000x lower loss than previous methods in image representation

03

Excels in downstream tasks like super-resolution and video interpolation

Abstract

We present a neural network architecture designed to naturally learn a positional embedding and overcome the spectral bias towards lower frequencies faced by conventional activation functions. Our proposed architecture, SPDER, is a simple MLP that uses an activation function composed of a sinusoidal multiplied by a sublinear function, called the damping function. The sinusoidal enables the network to automatically learn the positional embedding of an input coordinate while the damping passes on the actual coordinate value by preventing it from being projected down to within a finite range of values. Our results indicate that SPDERs speed up training by 10x and converge to losses 1,500-50,000x lower than that of the state-of-the-art for image representation. SPDER is also state-of-the-art in audio representation. The superior representation capability allows SPDER to also excel on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings