Representing Sounds as Neural Amplitude Fields: A Benchmark of Coordinate-MLPs and A Fourier Kolmogorov-Arnold Framework
Linfei Li, Lin Zhang, Zhong Wang, Fengyi Zhang, Zelin Li, and Ying Shen

TL;DR
This paper benchmarks Coordinate-MLPs for audio representation, identifies their limitations, and introduces Fourier-ASR, a Fourier-based neural framework that robustly models complex audio signals without extensive tuning.
Contribution
It provides the first benchmark for Coordinate-MLPs in audio, proposes Fourier-ASR with Fourier-KAN, and introduces FaLS for improved high-frequency learning.
Findings
Coordinate-MLPs require complex hyperparameter tuning for audio.
Fourier-ASR effectively models complex audio signals without tuning.
Positional encoding improves Coordinate-MLP audio quality.
Abstract
Although Coordinate-MLP-based implicit neural representations have excelled in representing radiance fields, 3D shapes, and images, their application to audio signals remains underexplored. To fill this gap, we investigate existing implicit neural representations, from which we extract 3 types of positional encoding and 16 commonly used activation functions. Through combinatorial design, we establish the first benchmark for Coordinate-MLPs in audio signal representations. Our benchmark reveals that Coordinate-MLPs require complex hyperparameter tuning and frequency-dependent initialization, limiting their robustness. To address these issues, we propose Fourier-ASR, a novel framework based on the Fourier series theorem and the Kolmogorov-Arnold representation theorem. Fourier-ASR introduces Fourier Kolmogorov-Arnold Networks (Fourier-KAN), which leverage periodicity and strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Hearing Loss and Rehabilitation · Speech and Audio Processing
