X-Learner: Learning Cross Sources and Tasks for Universal Visual   Representation

Yinan He; Gengshi Huang; Siyu Chen; Jianing Teng; Wang Kun; Zhenfei; Yin; Lu Sheng; Ziwei Liu; Yu Qiao; Jing Shao

arXiv:2203.08764·cs.CV·March 17, 2022

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Yinan He, Gengshi Huang, Siyu Chen, Jianing Teng, Wang Kun, Zhenfei, Yin, Lu Sheng, Ziwei Liu, Yu Qiao, Jing Shao

PDF

Open Access

TL;DR

X-Learner is a novel framework that jointly learns from multiple vision tasks and data sources to produce universal visual representations, improving transferability across various downstream tasks without extra annotations.

Contribution

The paper introduces X-Learner, a new representation learning method that bridges gaps among heterogeneous tasks and data sources for better universal visual features.

Findings

01

Achieves 3.0%, 3.3%, and 1.8% improvements on classification, detection, and segmentation tasks.

02

Demonstrates strong performance without extra annotations, modalities, or computational costs.

03

Effectively learns universal features through expansion and squeeze stages.

Abstract

In computer vision, pre-training models based on largescale supervised learning have been proven effective over the past few years. However, existing works mostly focus on learning from individual task with single data source (e.g., ImageNet for classification or COCO for detection). This restricted form limits their generalizability and usability due to the lack of vast semantic information from various tasks and data sources. Here, we demonstrate that jointly learning from heterogeneous tasks and multiple data sources contributes to universal visual representation, leading to better transferring results of various downstream tasks. Thus, learning how to bridge the gaps among different tasks and data sources is the key, but it still remains an open question. In this work, we propose a representation learning framework called X-Learner, which learns the universal feature of multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI