Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity

Moussa Koulako Bala Doumbouya; Dan Jurafsky; Christopher D. Manning

arXiv:2506.11035·cs.LG·October 14, 2025

Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity

Moussa Koulako Bala Doumbouya, Dan Jurafsky, Christopher D. Manning

PDF

Open Access 1 Models 3 Reviews

TL;DR

This paper introduces a differentiable Tversky similarity model for deep learning, improving interpretability and performance in image and language tasks by aligning more closely with human psychological similarity perception.

Contribution

It develops a learnable, differentiable Tversky similarity layer for neural networks, enabling non-linear feature comparisons and enhancing interpretability and accuracy.

Findings

01

24.7% accuracy improvement on NABirds with Tversky layer

02

7.8% perplexity reduction in GPT-2 on PTB

03

34.8% parameter reduction in GPT-2

Abstract

Work in psychology has highlighted that the geometric model of similarity standard in deep learning is not psychologically plausible because its metric properties such as symmetry do not align with human perception of similarity. In contrast, Tversky (1977) proposed an axiomatic theory of similarity with psychological plausibility based on a representation of objects as sets of features, and their similarity as a function of their common and distinctive features. This model of similarity has not been used in deep learning before, in part because of the challenge of incorporating discrete set operations. In this paper, we develop a differentiable parameterization of Tversky's similarity that is learnable through gradient descent, and derive basic neural network building blocks such as the Tversky projection layer, which unlike the linear projection layer can model non-linear functions…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- Original and well-motivated idea connecting human similarity judgments to machine-learned representations. Clear novelty of First differentiable implementation of Tversky similarity for neural networks. - The paper is well-written, clearly structured, and engaging to read. I learned a lot. - The XOR demonstration (Section 3.1, Figure 1) with its thorough validation convincingly shows that a single Tversky projection can model non-linear decision boundaries without composition with activation

Weaknesses

- Gradient flow through indicator functions: Equations 2-5 rely on indicator functions [a·fₖ > 0] which have zero derivative almost everywhere. It's unclear how gradients propagate during training—are you using straight-through estimators, sigmoid approximations, or another approach? Providing implementation details or code would help to understand you work. - Convergence of training: Tables 4-7 show convergence failure rates of 47-77% across many hyperparameter settings, with some producing NaN

Reviewer 02Rating 8Confidence 4

Strengths

Strengths: * Strong originality of entire prototype learning framework based on learned asymmetric comparison, which can be integrated in end-to-end training of various deep architectures. * The asymmetry weights, prototype dimensionality, and the feature bank are treated as learnable (free) parameters. * Demonstrates improved performance on several benchmarks, including NABirds, indicating applicability to problems with hundreds of classes. While there is no demonstration of extension to thous

Weaknesses

(Minor) Weaknesses: * It would be nice to have a small ablation experiment isolating the specific contribution of asymmetry by setting the asymmetry weights to zero and learning only the overlap (intersection) term in Equation 1. * Can be extended to allow learning only the asymmetry terms with the overlap term to zero. Taken together we can understand if performance gains rely mainly on similarities or differences.

Reviewer 03Rating 6Confidence 4

Strengths

- Clear and well-founded scientific motivation for the proposed novel methods. The idea of using Tversky similarity for deep learning is to my knowledge a novel contribution. - Empirical confirmation of the proposed neuralized Tversky approach, in both synthetic experiments and large-scale studies - The paper includes an honest and balanced discussion section and transparently raises limitations throughout the paper. - The experiments appear reproducible and are for the most part clearly describ

Weaknesses

- While I like the motivation of Section 3.1 (XOR), the section and associated Figure 1 read rushed and can be confusing in its current state. I think the main take-home is that the Tversky projection layer can in principle learn the XOR function. I would suggest focusing on introducing the problem clearly (which is currently done in the caption), and then discuss some of the limitations encountered during optimization, i.e., convergence and hyperparameter selection. - While I think the experime

Code & Models

Models

🤗
Blackroot/Tversky-All-Test-100MIsh
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications