Learning of Visual Relations: The Devil is in the Tails

Alakh Desai; Tz-Ying Wu; Subarna Tripathi; Nuno Vasconcelos

arXiv:2108.09668·cs.CV·August 24, 2021

Learning of Visual Relations: The Devil is in the Tails

Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

PDF

TL;DR

This paper proposes a simple yet effective training approach for visual relation models that addresses the long-tailed distribution challenge, outperforming complex models in scene graph generation.

Contribution

It introduces a novel decoupled training scheme with a new sampling method, demonstrating that simplicity combined with targeted sampling improves long-tailed visual relation learning.

Findings

01

DT2-ACBS outperforms complex state-of-the-art methods

02

Simple models can be highly effective with proper training strategies

03

Addressing long-tailed distributions is crucial for scene graph tasks

Abstract

Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.