TL;DR
This paper introduces DeepObjectLog, a neurosymbolic model that learns object-centric representations directly from images using weak supervision, enabling improved reasoning and generalization without detailed annotations.
Contribution
The work presents a novel probabilistic neurosymbolic framework that integrates object-centric perception with symbolic reasoning under distant supervision, eliminating the need for explicit object labels.
Findings
Achieves superior out-of-distribution generalization across various visual reasoning tasks.
Learns object-level arguments directly from global labels without per-object annotations.
Outperforms existing neural object-centric and neurosymbolic baselines.
Abstract
Neurosymbolic learning can use symbolic rules to provide supervision for latent concepts from weak labels, but it commonly assumes that the entities referenced by these rules are already specified. Object-centric models decompose images into slot-like representations; however, such slots are not necessarily aligned with the predicates required for symbolic reasoning. We investigate object-centric neurosymbolic learning under distant supervision, where the object-level arguments of a logic program are learned directly from images using only global task labels. We introduce DeepObjectLog, a probabilistic neurosymbolic model that integrates a slot-based perceptual encoder with a probabilistic logic layer. The encoder predicts objectness and class probabilities for candidate object representations, while the logic layer marginalizes over latent objectness and class assignments to compute…
Peer Reviews
Decision·Submitted to ICLR 2026
- The proposed model overcomes one of the primary weaknesses of neurosymbolic approaches, which is that they typically depend on very detailed object-level annotations to train the perceptual encoder. This approach more deeply integrates the neural/perceptual and symbolic/reasoning components, enabling it to be trained based on downstream task error. - The model demonstrates promising results on various out-of-distribution generalization settings. - The paper is very clearly written, and nicely
- The primary limitation is that only synthetic tasks are investigated. Additionally, these tasks both involve relatively simple classification, so they do not provide the strongest test of the symbolic component's reasoning abilities. The CLEVR-addition dataset is a step in the right direction, but this task still involves simple, synthetic images and limited (one-step) reasoning. It would be more compelling if the model could be extended to tasks that require multi-step reasoning and inference
* The paper is very well written and is easy to follow. * The paper tackles an ambitious problem, namely, performing logical reasoning over symbols by leveraging object-centric representations. * Integrating ProbLog with a slot-based neural network model is novel to the best of my understanding and is an interesting approach. * The experimental section is well presented and the results showing the benefits of DeepObjectLog are convincing.
* I found the presentation in Section 3.4 a bit confusing regarding how ProbLog is integrated into the model. Is it the case that the logical rule/task to be performed, e.g., add the two numbers in the image, is defined a priori, or is the task also inferred from observed data. * I can imagine DeepObjectLog working well in situations in simple logical reasoning task as the authors tested, however, I am skeptical of how the method will perform for more complex reasoning task in which the symbols
1. The paper is well written, clear, and easy to follow. 2. The evaluations focus on out-of-distribution tasks, which are highly relevant and remain partly unresolved in recent years.
1. Fairly simple evaluation: (1) datasets are synthetic toy datasets, which is acknowledged in the limitations, and (2) the tasks are quite simple reasoning problems that can be expressed as a multiclass classification problem. I would like to see comparisons on (1) a more complex, general task such as VQA and (2) at least one real-world (or close to real-world) dataset. 2. Some claims in the paper seem not to be correct, and some of the main conclusions are already well known in the object-cent
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Bayesian Modeling and Causal Inference · Advanced Graph Neural Networks
