Context-LGM: Leveraging Object-Context Relation for Context-Aware Object   Recognition

Mingzhou Liu; Xinwei Sun; Fandong Zhang; Yizhou Yu; Yizhou Wang

arXiv:2110.04042·cs.CV·October 11, 2021

Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition

Mingzhou Liu, Xinwei Sun, Fandong Zhang, Yizhou Yu, Yizhou Wang

PDF

Open Access

TL;DR

This paper introduces Context-LGM, a hierarchical generative model that explicitly captures object-context relations using a latent variable approach and Transformer-based inference, improving context-aware object recognition accuracy.

Contribution

The novel hierarchical latent generative model explicitly models object-context relations and employs a Transformer for contextual inference, advancing context-aware recognition methods.

Findings

01

Achieved state-of-the-art results on lung cancer prediction.

02

Improved emotion recognition accuracy.

03

Effectively models object-context relations with latent variables.

Abstract

Context, as referred to situational factors related to the object of interest, can help infer the object's states or properties in visual recognition. As such contextual features are too diverse (across instances) to be annotated, existing attempts simply exploit image labels as supervision to learn them, resulting in various contextual tricks, such as features pyramid, context attention, etc. However, without carefully modeling the context's properties, especially its relation to the object, their estimated context can suffer from large inaccuracy. To amend this problem, we propose a novel Contextual Latent Generative Model (Context-LGM), which considers the object-context relation and models it in a hierarchical manner. Specifically, we firstly introduce a latent generative model with a pair of correlated latent variables to respectively model the object and context, and embed their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Softmax