Loading paper
TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation | Tomesphere