Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli, Di Liu, Abhishek Aich, Dimitris Metaxas, Samuel, Schulter

TL;DR
This paper presents a novel multi-dataset image segmentation training method that uses language embeddings to handle semantic inconsistencies across datasets, improving performance on multiple benchmarks.
Contribution
We introduce a simple approach using language-based class embeddings and label space queries to effectively resolve semantic conflicts in multi-dataset training.
Findings
Outperforms previous methods by 1.6% mIoU on semantic segmentation
Achieves 9.1% higher PQ in panoptic segmentation
Improves AP by 12.1% in instance segmentation
Abstract
Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusive semantics. However, merging them for multi-dataset training disrupts this harmony and leads to semantic inconsistencies; for example, the class "person" in one dataset and class "face" in another will require multilabel handling for certain pixels. Existing methods struggle with this setting, particularly when evaluated on label spaces mixed from the individual training sets. To overcome these issues, we introduce a simple yet effective multi-dataset training approach by integrating language-based embeddings of class names and label space-specific query embeddings. Our method maintains high performance regardless of the underlying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
