Context-LGM: Leveraging Object-Context Relation for Context-Aware Object Recognition
Mingzhou Liu, Xinwei Sun, Fandong Zhang, Yizhou Yu, Yizhou Wang

TL;DR
This paper introduces Context-LGM, a hierarchical generative model that explicitly captures object-context relations using a latent variable approach and Transformer-based inference, improving context-aware object recognition accuracy.
Contribution
The novel hierarchical latent generative model explicitly models object-context relations and employs a Transformer for contextual inference, advancing context-aware recognition methods.
Findings
Achieved state-of-the-art results on lung cancer prediction.
Improved emotion recognition accuracy.
Effectively models object-context relations with latent variables.
Abstract
Context, as referred to situational factors related to the object of interest, can help infer the object's states or properties in visual recognition. As such contextual features are too diverse (across instances) to be annotated, existing attempts simply exploit image labels as supervision to learn them, resulting in various contextual tricks, such as features pyramid, context attention, etc. However, without carefully modeling the context's properties, especially its relation to the object, their estimated context can suffer from large inaccuracy. To amend this problem, we propose a novel Contextual Latent Generative Model (Context-LGM), which considers the object-context relation and models it in a hierarchical manner. Specifically, we firstly introduce a latent generative model with a pair of correlated latent variables to respectively model the object and context, and embed their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Softmax
