A Pooling Approach to Modelling Spatial Relations for Image Retrieval and Annotation
Mateusz Malinowski, Mario Fritz

TL;DR
This paper introduces a pooling-based method for modeling spatial relations in images, enhancing image retrieval and annotation tasks involving spatial language through a learning-based approach.
Contribution
It proposes a novel pooling interpretation of spatial relations and demonstrates its effectiveness in improving spatial reasoning in image understanding tasks.
Findings
Improved performance on image retrieval and annotation tasks.
Effective learning of spatial relation representations.
Insights gained from experiments on a new spatial relations dataset.
Abstract
Over the last two decades we have witnessed strong progress on modeling visual object classes, scenes and attributes that have significantly contributed to automated image understanding. On the other hand, surprisingly little progress has been made on incorporating a spatial representation and reasoning in the inference process. In this work, we propose a pooling interpretation of spatial relations and show how it improves image retrieval and annotations tasks involving spatial language. Due to the complexity of the spatial language, we argue for a learning-based approach that acquires a representation of spatial relations by learning parameters of the pooling operator. We show improvements on previous work on two datasets and two different tasks as well as provide additional insights on a new dataset with an explicit focus on spatial relations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
