TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation
Wubo Li, Wei Zou, Xiangang Li

TL;DR
This paper introduces TCT, a cross-supervised learning method using transformers to improve multimodal sequence representations, achieving state-of-the-art results in video-grounded dialogue tasks.
Contribution
The paper proposes TCT, a novel transformer-based cross-supervised learning approach for multimodal sequences, enhancing semantic representation over traditional unimodal methods.
Findings
TCT improves semantic quality of multimodal representations.
MTN-TCT achieves new state-of-the-art in video-grounded dialogue.
Learned representations outperform direct unimodal approaches.
Abstract
Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging. To tackle this, we propose the Transformer based Cross-modal Translator (TCT) to learn unimodal sequence representations by translating from other related multimodal sequences on a supervised learning method. Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality. The proposed method reports new state-of-the-art performance on video-grounded dialogue which indicates representations learned by TCT are more semantics compared to directly use unimodality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Speech and dialogue systems
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
