Adaptive Self-training Framework for Fine-grained Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim,, Chanyoung Park

TL;DR
This paper introduces a novel self-training framework with adaptive pseudo-labeling and graph learning to improve fine-grained scene graph generation, addressing long-tailed predicate distribution and annotation issues.
Contribution
It proposes a model-agnostic self-training framework with a new pseudo-labeling technique and graph structure learner for better scene graph generation.
Findings
Enhanced predicate classification performance.
Effective handling of long-tailed predicate distribution.
Improved results on various SGG models.
Abstract
Scene graph generation (SGG) models have suffered from inherent problems regarding the benchmark datasets such as the long-tailed predicate distribution and missing annotation problems. In this work, we aim to alleviate the long-tailed problem of SGG by utilizing unannotated triplets. To this end, we introduce a Self-Training framework for SGG (ST-SGG) that assigns pseudo-labels for unannotated triplets based on which the SGG models are trained. While there has been significant progress in self-training for image recognition, designing a self-training framework for the SGG task is more challenging due to its inherent nature such as the semantic ambiguity and the long-tailed distribution of predicate classes. Hence, we propose a novel pseudo-labeling technique for SGG, called Class-specific Adaptive Thresholding with Momentum (CATM), which is a model-agnostic framework that can be…
Peer Reviews
Decision·ICLR 2024 poster
- ST-SGG serves as a model-agnostic framework, meaning it can be applied to various existing SGG models. This aspect has the potential to expand the applicability of the self-training in SGG. - Based on experimental results, the proposed framework seems to effectively alleviate the issues of long-tailed distribution. The performance improvements are primarily concentrated on fine-grained predicate classes.
- In some experiments, the R@k values significantly decreased after employing the proposed framework. The paper lacks some explanations for this phenomenon. - There's a concern if the proposed framework might, in some scenarios, sacrifice a considerable amount of overall performance to achieve improvement in fine-grained predicates. - The paper lacks a clear depiction of the overall framework structure. Please clearly demonstrate how the different components interact and how pseudo-labels are ge
This paper has the following strengths **1. Well-motivated:** The paper's motivation is well-stated and clearly disseminated to the reader. Exploring vast un-annotated triplets in the image databases is one of the desired direction of open-set annotation dataset and Visual Genome is one of the most useful resources for conducting such exploration. This paper successfully established this motivation in their introduction. **2. Novel self-training loss:** The proposed simple yet novel and effec
This paper has the following weaknesses **1. Comparison with other debiasing methods missing.** The paper demonstrated that their method can improve the diversity of the baseline models. However, such improvement of baseline models is prevalent in SGG literature now. Therefore, a direct comparison with other debiasing methods would shed light more on their performances. For example, VCTree+ST-SGG should be compared with VCTree+Resample. With that comparison, the readers would have better under
1. The idea is well-motivated and intuitive. They provide an in-depth analysis of long-tailed SGG. 2. The paper is well written and easy to understand. 3. Extensive experiments and ablations verify the effectiveness of enhancing the performance of tail classes.
1.The technical contribution of this paper might be limited. The self-training framework is used for image classification, the adaptive thresholding is used to obtain accurate pseudo labels in many tasks, and the GSL is also not new. The authors should take more discussion about their novelty and variant in SGG. 2.There are some SOTA SGG methods which might get higher performance than this paper. These methods should also be included for comparisons and analyses, such as VETO[1], GCL[2], PE-Net
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
