Domain Generalization through Spatial Relation Induction over Visual Primitives
Dat Nguyen, Duc-Duy Nguyen

TL;DR
This paper introduces PARSE, a novel domain generalization framework that explicitly models visual primitives and their spatial relations, leading to improved classification robustness across domains.
Contribution
PARSE explicitly factors visual recognition into primitives and their relations, enabling end-to-end learning of structural compositions for better domain generalization.
Findings
PARSE improves accuracy by over 4.5 percentage points on CUB-DG.
PARSE remains competitive with existing methods on DomainBed.
The approach models spatial relations with differentiable predicates.
Abstract
Domain generalization requires identifying stable representations that support reliable classification across domains. Most existing methods seek such stability through improving the training process, for example, through model selection strategies, data augmentation, or feature-alignment objectives. Although these strategies can be effective, they leave the representation learning of structural composition implicit, which may limit performance on compositional domain generalization benchmarks. In this work, we propose Primitive-Aware Relational Structure for domain gEneralization (PARSE), an image classification framework that factors visual recognition into visual primitives and their relational composition. We represent these compositions using soft binary, ternary, and quaternary predicates over primitive locations, yielding differentiable measures of spatial alignment that can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
