RTify: Aligning Deep Neural Networks with Human Behavioral Decisions
Yu-Ang Cheng, Ivan Felipe Rodriguez, Sixuan Chen, Kohitij Kar, Takeo, Watanabe, Thomas Serre

TL;DR
This paper introduces RTify, a framework that aligns neural network decision dynamics with human reaction times, improving models of human visual decision-making by integrating temporal behavioral data.
Contribution
The paper presents a novel method to constrain RNNs with human RTs, enabling better modeling of human perceptual decision dynamics and integrating these with visual processing models.
Findings
The model accurately predicts human reaction times across experiments.
The approach enables optimization of speed-accuracy tradeoffs in decision models.
Integration with CNNs improves natural image decision modeling.
Abstract
Current neural network models of primate vision focus on replicating overall levels of behavioral accuracy, often neglecting perceptual decisions' rich, dynamic nature. Here, we introduce a novel computational framework to model the dynamics of human behavioral choices by learning to align the temporal dynamics of a recurrent neural network (RNN) to human reaction times (RTs). We describe an approximation that allows us to constrain the number of time steps an RNN takes to solve a task with human RTs. The approach is extensively evaluated against various psychophysics experiments. We also show that the approximation can be used to optimize an "ideal-observer" RNN model to achieve an optimal tradeoff between speed and accuracy without human data. The resulting model is found to account well for human RT data. Finally, we use the approximation to train a deep learning implementation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural dynamics and brain function · Visual perception and processing mechanisms · Face Recognition and Perception
MethodsALIGN · Focus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
