TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance   Segmentation

Rongkun Zheng; Lu Qi; Xi Chen; Yi Wang; Kun Wang; Yu Qiao; Hengshuang; Zhao

arXiv:2312.06630·cs.CV·March 19, 2024·1 cites

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation

Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang, Zhao

PDF

Open Access 1 Repo

TL;DR

TMT-VIS introduces a taxonomy-aware joint training approach for video instance segmentation, leveraging taxonomy information to improve model focus, classification accuracy, and achieve state-of-the-art results across multiple benchmarks.

Contribution

The paper proposes a novel two-stage taxonomy aggregation module that enhances multi-dataset training by incorporating taxonomy priors into instance queries for better segmentation performance.

Findings

01

Significant performance improvements over baselines.

02

State-of-the-art results on four challenging benchmarks.

03

Effective generalization across diverse datasets.

Abstract

Training on large-scale datasets can boost the performance of video instance segmentation while the annotated datasets for VIS are hard to scale up due to the high labor cost. What we possess are numerous isolated filed-specific datasets, thus, it is appealing to jointly train models across the aggregation of datasets to enhance data volume and diversity. However, due to the heterogeneity in category space, as mask precision increases with the data volume, simply utilizing multiple datasets will dilute the attention of models on different taxonomies. Thus, increasing the data scale and enriching taxonomy space while improving classification precision is important. In this work, we analyze that providing extra taxonomy information can help models concentrate on specific taxonomy, and propose our model named Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rkzheng99/tmt-vis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques