PQA: Perceptual Question Answering
Yonggang Qi, Kai Zhang, Aneeshan Sain, Yi-Zhe Song

TL;DR
This paper introduces the perceptual question answering (PQA) challenge, a new dataset and task that prompts models to generate answers based on perceptual organization principles, emphasizing synthetic data and pattern synthesis over traditional scene interpretation.
Contribution
It presents the first dataset of perceptual question-answer pairs based on Gestalt principles and proposes a novel self-attention based model to solve these questions by generating answers from scratch.
Findings
The proposed model outperforms naive and strong baselines.
Humans require significantly less data to learn perceptual tasks.
Synthetic data effectively captures perceptual organization principles.
Abstract
Perceptual organization remains one of the very few established theories on the human visual system. It underpinned many pre-deep seminal works on segmentation and detection, yet research has seen a rapid decline since the preferential shift to learning deep models. Of the limited attempts, most aimed at interpreting complex visual scenes using perceptual organizational rules. This has however been proven to be sub-optimal, since models were unable to effectively capture the visual complexity in real-world imagery. In this paper, we rejuvenate the study of perceptual organization, by advocating two positional changes: (i) we examine purposefully generated synthetic data, instead of complex real imagery, and (ii) we ask machines to synthesize novel perceptually-valid patterns, instead of explaining existing data. Our overall answer lies with the introduction of a novel visual challenge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
