Loading paper
Adding simple structure at inference improves Vision-Language Compositionality | Tomesphere