Neuro-Symbolic Decoding of Neural Activity
Yanchen Wang, Joy Hsu, Ehsan Adeli, Jiajun Wu

TL;DR
NEURONA is a neuro-symbolic framework that improves decoding of fMRI data by integrating symbolic reasoning with neural activity patterns, enhancing accuracy and generalization in concept grounding from visual stimuli.
Contribution
This paper introduces NEURONA, a novel neuro-symbolic approach that combines symbolic reasoning with neural decoding to better interpret fMRI responses and generalize to unseen queries.
Findings
Incorporating structural priors improves decoding accuracy.
The framework generalizes well to unseen queries.
Neuro-symbolic methods are promising for neural activity understanding.
Abstract
We propose NEURONA, a neuro-symbolic framework for fMRI decoding and concept grounding in neural activity. Leveraging image- and video-based fMRI question-answering datasets, NEURONA learns to decode interacting concepts from visual stimuli based on patterns of fMRI responses, integrating symbolic reasoning and compositional execution with fMRI grounding across brain regions. We demonstrate that incorporating structural priors (e.g., compositional predicate-argument dependencies between concepts) into the decoding process significantly improves both decoding accuracy over precise queries, and notably, generalization to unseen queries at test time. With NEURONA, we highlight neuro-symbolic frameworks as promising tools for understanding neural activity.
Peer Reviews
Decision·ICLR 2026 Poster
I am not familiar with tasks with fMRI, but to the best of my knowledge the framework is new in bringing predicate–argument guidance into an fMRI-QA decoder and testing it via five alternatives within one executor. The overall evaluation setup looks correct to me: the compositional split is appropriate; train and test use disjoint entity–relation pairs, so it measures true generalization. The ablations are clean and isolate the source of gains: unguided multi-region grounding adds little over a
The primary concern is the novelty about the proposed approach. The proposed method assembles known pieces: a LEFT-style executor, VLM-derived scene graphs, and standard cortical parcellations. The main new element is predicate–argument guidance and the within-executor hypothesis family that tests it. I acknolwedge that these are thoughtful design choices, but they are not a new model class or theory. That said, the paper still adds value: it gives a clean empirical test of the idea, uses a co
Despite fundamental issues, the paper has notable technical strengths: 1. A well-defined evaluator logic for creating conjunctions that could help in better decoding. 2. Strong empirical performance: 47% relative improvement over baselines is substantial, and the generalization to unseen compositional queries is genuinely impressive evidence that the learned representations support novel combinations. 3. Cross-dataset validation: Testing on both image and video datasets with consistent results
1. The Core Interpretive Problem: The paper's fundamental flaw is the interpretive leap from 'symbolic structure improves neural decoding' to 'relational meaning in the brain emerges from structured activations guided by hierarchical predicate-argument structure.' The authors consistently conflate improved task performance with evidence of compositional neural mechanisms, but these are different claims requiring different types of evidence. The 'grounding' results are classification logits indi
1. The paper is well motivated to tackle the puzzle of interpretable concept grounding in neural data, with an emphasis on compositional structure. 2. The results are consistent across two datasets and multiple concept types. 3. The ablation analyses meaningfully test hypotheses about region-level grounding and compositional generalization. 4. It positions neuro-symbolic modeling as a promising route to study structured neural semantics beyond pixel-level reconstruction.
1.The results in Tables 1–2 are trained and tested on one subject per dataset, which limits generalizability. Multi-subject or cross-subject validation is essential for neuroscientific conclusions. 2. Grounding is restricted to coarse atlas parcels; voxel-level analyses would strengthen claims about spatial specificity of “modular concepts.” 3. Results in table 1 and 2 are using models trained from one subject in each dataset 4. It remains unclear how much each region or network contributes to c
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Multimodal Machine Learning Applications · Action Observation and Synchronization
