Improving Set Function Approximation with Quasi-Arithmetic Neural Networks

Tomas Tokar; Scott Sanner

arXiv:2602.04941·cs.LG·February 6, 2026

Improving Set Function Approximation with Quasi-Arithmetic Neural Networks

Tomas Tokar, Scott Sanner

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Quasi-Arithmetic Neural Networks (QUANNs), a novel learnable aggregation method for set data that enhances expressivity and transferability, supported by theoretical universality and superior empirical performance.

Contribution

The paper proposes the Neuralized Kolmogorov Mean (NKM) and QUANNs, advancing set function approximation with a trainable, invertible aggregation framework that improves over fixed pooling methods.

Findings

01

QUANNs outperform state-of-the-art baselines on various benchmarks.

02

QUANNs learn transferable embeddings effective even outside set tasks.

03

Theoretical proof of universality for broad set-function classes.

Abstract

Sets represent a fundamental abstraction across many types of data. To handle the unordered nature of set-structured data, models such as DeepSets and PointNet rely on fixed, non-learnable pooling operations (e.g., sum or max) -- a design choice that can hinder the transferability of learned embeddings and limits model expressivity. More recently, learnable aggregation functions have been proposed as more expressive alternatives. In this work, we advance this line of research by introducing the Neuralized Kolmogorov Mean (NKM) -- a novel, trainable framework for learning a generalized measure of central tendency through an invertible neural function. We further propose quasi-arithmetic neural networks (QUANNs), which incorporate the NKM as a learnable aggregation function. We provide a theoretical analysis showing that, QUANNs are universal approximators for a broad class of common…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The idea to parameterize the Kolmogorov mean function and replace the standard mean pooling in existing set neural network architectures is intuitive and interesting. Empirically, the resulting QUANN architecture outperforms all baselines. It is good to see that a simple and principled modification leads to effective performance gain. The experiments are well designed to empirically verify the authors' hypotheses on the practical advantages of QUANNs.

Weaknesses

Writing can be improved. - Section 5.2 has a lot of references to the theorems in the appendix. It is difficult to follow the discussion in this section without having to read the theorems in the appendix. I think it would be helpful to at least state some informal and short versions of the theorems here. - Please use citet{…} and citep{…} appropriately, e.g., use citep{…} for references that appear in lines 28-32. There are many misuses of citet{…} or cite{…} throughout the entire paper, plea

Reviewer 02Rating 6Confidence 4

Strengths

The paper is well written and clear. The choice of Kolmogorov mean and its neutralized version is well-motivated. There is a sound discussion of the theoretical advantage of the proposed solution over alternatives, complemented with promising experimental results.

Weaknesses

There were few works that explicitly addressed the problem of learning pooling operators in the past: - Euan Ong, Petar Veličković, Learnable Commutative Monoids for Graph Neural Networks, LOG 2022. - P. Zuidberg Dos Martires, Neural Semirings, NeSy 2021. - G Pellegrini, A Tibo, P Frasconi, A Passerini, M Jaeger, Learning aggregation functions, IJCAI 2021. While none of them directly suggests using the Kolmogorov mean, they all attempt to go beyond predefined aggregators, and at least one (neu

Reviewer 03Rating 6Confidence 4

Strengths

The formulation of element-wise representation into the multiset attention is theoretically interesting. The integration of this multiset attention into an MBC processing, is practically relevant. The experimental results are convincing, the performance is good across tasks, the stability w.r.t mini-batch size is good.

Weaknesses

The results and discussion read as if there are only advantages of the proposed method. What are limitations, where does UST not perform well? There is a discussion section, but it does not reflect on the possible disadvantages of UST. Multiset attention is here posed as a completely new thing. In "Unlocking Slot Attention by Changing Optimal Transport Costs" [1] there is also attention across the elements in the multiset. The reference is missing in the related work.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topological and Geometric Data Analysis · Adversarial Robustness in Machine Learning