Loading paper
MARMOT: A Deep Learning Framework for Constructing Multimodal Representations for Vision-and-Language Tasks | Tomesphere