Learning of Visual Relations: The Devil is in the Tails
Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

TL;DR
This paper proposes a simple yet effective training approach for visual relation models that addresses the long-tailed distribution challenge, outperforming complex models in scene graph generation.
Contribution
It introduces a novel decoupled training scheme with a new sampling method, demonstrating that simplicity combined with targeted sampling improves long-tailed visual relation learning.
Findings
DT2-ACBS outperforms complex state-of-the-art methods
Simple models can be highly effective with proper training strategies
Addressing long-tailed distributions is crucial for scene graph tasks
Abstract
Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
