Broadcasting Convolutional Network for Visual Relational Reasoning
Simyung Chang, John Yang, Seonguk Park, Nojun Kwak

TL;DR
This paper introduces the Broadcasting Convolutional Network (BCN) and Multi-Relational Network (multiRN), which enhance visual relational reasoning by efficiently capturing multi-object relations with improved computational complexity, achieving state-of-the-art results.
Contribution
The paper presents BCN for effective feature extraction and introduces multiRN, extending relation reasoning to multiwise relations with linear complexity, improving over traditional pairwise methods.
Findings
multiRN achieves state-of-the-art performance on CLEVR dataset
BCN effectively embeds location information into feature maps
multiRN reduces computational complexity from O(n^2) to O(n)
Abstract
In this paper, we propose the Broadcasting Convolutional Network (BCN) that extracts key object features from the global field of an entire input image and recognizes their relationship with local features. BCN is a simple network module that collects effective spatial features, embeds location information and broadcasts them to the entire feature maps. We further introduce the Multi-Relational Network (multiRN) that improves the existing Relation Network (RN) by utilizing the BCN module. In pixel-based relation reasoning problems, with the help of BCN, multiRN extends the concept of `pairwise relations' in conventional RNs to `multiwise relations' by relating each object with multiple objects at once. This yields in O(n) complexity for n objects, which is a vast computational gain from RNs that take O(n^2). Through experiments, multiRN has achieved a state-of-the-art performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
